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5 REPLICATION COMPETENT HEPATITIS C VIRUS 

AND METHODS OF USE 

CONTINUING APPLICATION DATA 

This application claims the benefit of U.S. Provisional Application Serial 
10 No. 60/525,989, filed December 1 , 2003, which is incorporated by reference 
herein. 

GOVERNMENT FUNDING 
The present invention was made with government support under Grant 
15 Nos. U19-AI40035 and N01-AI25488, awarded by the National Institute of 
Allergy and Infectious Diseases. The Government has certain rights in this 
invention. 

BACKGROUND 

20 Hepatitis C virus is the most common cause of chronic viral hepatitis 

within the United States, infecting approximately 4 million Americans and 
responsible for the deaths of 8,000-10,000 persons annually due to progressive 
hepatic fibrosis leading to cirrhosis and/or the development of hepatocellular 
carcinoma. Hepatitis C virus is a single stranded, positive-sense RNA virus with a 

25 genome length of approximately 9.6 kb. It is currently classified within a separate 
genus of the flavivirus family, the genus Hepacivirus. The epatitis C virus 
genome contains a single large open reading frame (ORF) that follows a 5' non- 
translated RNA of approximately 342 bases containing an internal ribosome entry 
segment (IRES) directing cap-independent initiation of viral translation. The 

30 large ORF encodes a polyprotein which undergoes post-translational cleavage, 
under control of cellular and viral proteinases. This yields a series of structural 
proteins which include a core or nucleocapsid protein, two envelope 
glycoproteins, El and E2, and at least six nonstructural replicative proteins. 
These include NS2 (which with the adjacent NS3 sequence demonstrates cis- 

35 active metalloproteinase activity at the NS2/NS3 cleavage site), NS3 (a serine 
proteinase/NTPase/RNA helicase), NS4A (serine proteinase accessory factor), 
NS4B, NS5A, and NS5B (RNA-dependent RNA polymerase). 

l 
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With the exception of the 5' non-translated RNA, there is substantial 
genetic heterogeneity among different stains of hepatitis C virus. Phylogenetic 
analyses have led to the classification of hepatitis C virus strains into a series of 
genetically distinct "genotypes," each of which contains a group of genetically 
5 related viruses. The genetic distance between some of these genotypes is large 
enough to suggest that there may be biologically significant serotypic differences 
as well. There is little understanding of the extent to which infection with a virus 
of any one genotype might confer protection against viruses of a different 
genotype. 

10 The currently available therapy of interferon in combination with ribavirin has 

poor response rate against most prevalent strains of HCV, genotype la and lb. 
Establishment of selectable subgenomic replicon systems has advanced the study of 
HCV RNA replication. However, only replicons of genotype lb strains are readily 
available, and extension of replicon systems to other genotypes has been largely 

15 unsuccessful. Considering the nature of high genetic variability of HCV, HCV 

replication systems derived from other genotypes will be very helpful in the effort of 
drug discovery. In support with this notion, chimeric replicons containing a genotype 
la polymerase in the background of a genotype lb replicon were more resistant to 
interferon treatment in vitro than the replicon derived from a genotype lb HCV. 

20 Extension of replicon system to other genotypes are also necessary to understand the 
mechanism of HCV RNA replication and the contribution of variable sequences in that 
process. 

Recently two groups reported the generation of genotype la replication 
system using highly permissive sublines of Huh-7 cells. Blight et al. (J. Virol. 77, 

25 3 1 8 1 -3 1 90 (2003)) were able to select G4 1 8 resistant colonies supporting 

replication of genotype la derived subgenomic replicons in a hyper-permissive 
Huh7 subline, Huh-7. 5, that was generated by curing an established G418- 
resistant replicon cell line of the cubgenomic Conl replicon RNA that had been 
used to select it by treatment with interferon-alpha (Blight et al., J. Virol., 76, 

30 13001-13014 (2002)). Sequence analysis of replicating HCV RNAs inside of such 
selected cell lines showed that the most common critical mutations were located at 
amino acid position 470 of NS3 (P1496L) within domain II of the NS3 helicase, 
and the NS5A mutation (S2204I). In other case, Grobler et al. (J. Biol. Chem., 
278,16741-16746 (Feb, 2003)), used a systematic mutational approach to reach 
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the similar conclusion that both P1496L and S2204J combination was necessary to 
get genotype la replication in a highly permissive Huh-7 subline which was 
selected in an independent but similar way. However, genotype la RNAs with 
these two enhanced mutations does not undergo replication in the Huh-7 cell line, 
5 indicating limited usefulness of this system. 

SUMMARY 

The present invention provides replication competent polynucleotides. 
The replication competent polynucleotides include a 5' non-translated region 

10 (NTR), a 3'NTR, and a first coding sequence present between the 5'NTR and 3' 
NTR and encoding a hepatitis C virus polyprotein. The 5'NTR, the 3'NTR, and 
the nucleotide sequence encoding the polyprotein may be genotype la. The 
polyprotein includes an isoleucine at about amino acid 2204, and further includes - 
an adaptive mutation. The adaptive mutation can be an arginine at about amino 

15 acid 1067, an arginine at about amino acid 1691, a valine at about amino acid 
2080, an isoleucine at about amino acid 1655, an arginine at about amino acid 
2040, an arginine at about amino acid 1 188, or a combination thereof. The 
polyprotein may be a subgenomic polyprotein. The polyprotein may include the 
cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B. 

20 The replication competent polynucleotides may further include a second coding 
sequence. The second coding sequence can encode, for instance, a marker or a 
transactivator. The replication competent polynucleotides may further include a 
nucleotide sequence having cis-acting ribozyme activity, wherein the nucleotide 
sequence is located 3' of the 3'NTR. 

25 Also provided by the present invention are methods for making a 

replication competent polynucleotide, and the resulting replication competent 
polynucleotide. The methods include providing a polynucleotide having a 5' 
NTR, 3' NTR, a first coding sequence present between the 5' NTR and 3' NTR and 
encoding a hepatitis C virus polyprotein. Typically, the 5'NTR, polyprotein, and 

30 3'NTR are genotype la. The polyprotein includes a serine at about amino acid 
2204, a glutamine at about amino acid 1067, a lysine at about amino acid 1691, a 
phenylalanine at about amino acid 2080, a valine at about amino acid 1655, a 
lysine at about amino acid 2040, or a glycine at about amino acid 1 188. The 

3 
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method also includes altering the coding sequence such that the polyprotein 
encoded thereby includes an isoleucine at amino acid 2204, and an adaptive 
mutation. The polyprotein may be a subgenomic polyprotein. The polyprotein 
may include the cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, 
5 NS5A, and NS5B. 

The present invention further provides methods for identifying a 
compound that inhibits replication of a replication competent polynucleotide. The 
method includes contacting a cell containing a replication competent 
polynucleotide with a compound, incubating the cell under conditions wherein the 

10 replication competent polynucleotide replicates in the absence of the compound, 
and detecting the replication competent polynucleotide, wherein a decrease of the 
replication competent HCV polynucleotide in the cell contacted with the 
compound compared to the replication competent polynucleotide in a cell not 
contacted with the compound indicates the compound inhibits replication of the 

15 replication competent polynucleotide. The detecting of the replication competent 
polynucleotide can include, for instance, nucleic acid amplification or identifying 
a marker encoded by the replication competent polynucleotide or by the cell 
containing the replication competent polynucleotide. 

Also provided by the present invention are methods for selecting a 

20 replication competent polynucleotide. The method includes incubating a cell 
containing a polynucleotide including a 5'NTR, a 3'NTR, and a first coding 
sequence present between the 5'NTR and 3'NTR and encoding a hepatitis C virus 
polyprotein, and a second coding sequence. The polyprotein includes an 
isoleucine at about amino acid 2204, and further includes an adaptive mutation. 

25 The second coding sequence encodes a selectable marker conferring resistance to 
a selecting agent that inhibits replication of a cell that does not express the 
selectable marker. The method also includes detecting a cell that replicates in the 
presence of the selecting agent, wherein the presence of such a cell indicates the 
polynucleotide is replication competent. The method may further include 

30 obtaining a virus particle produced by the cell, exposing a second cell to the 
isolated virus particle and incubating the second cell in the presence of the 
selecting agent, and detecting a second cell that replicates in the presence of the 
selecting agent, wherein the presence of such a cell indicates the replication 
competent polynucleotide in the first cell produces an infectious virus particle. 
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The present invention also provides methods for detecting a replication 
competent polynucleotide, including incubating a cell containing a replication 
competent polynucleotide. The replication competent polynucleotide includes a 5' 
NTR, a 3' NTR, and a first coding sequence present between the 5* NTR and 3' 
5 NTR and encoding a hepatitis C virus polyprotein, and a second coding sequence 
encoding a transactivator. The cell includes a transactivated coding region and an 
operator sequence operably linked to the transactivated coding region, and the 
transactivated coding region encodes a detectable marker, wherein the 
transactivator alters transcription of the transactivated coding region. The method 
10 further includes detecting the detectable marker, wherein the presence of the 
detectable marker indicates the cell includes a replication competent 
polynucleotide. 

Definitions 

15 As used herein, the term "replication competent polynucleotide" refers to a 

polynucleotide that replicates when present in a cell. For instance, a 
complementary polynucleotide is synthesized. As used herein, the term 
"replicates in vitro" indicates the polynucleotide replicates in a cell that is growing 
in culture. The cultured cell can be one that has been selected to grow in culture, 

20 including, for instance, an immortalized or a transformed cell. Alternatively, the 
cultured cell can be one that has been explanted from an animal. "Replicates in 
vivo" indicates the polynucleotide replicates in a cell within the body of an animal, 
for instance a primate (including a chimpanzee) or a human. In some aspects of 
the present invention, replication in a cell can include the production of infectious 

25 viral particles, i.e., viral particles that can infect a cell and result in the production 
of more infectious viral particles. 

As used herein, the term "polynucleotide" refers to a polymeric form of 
nucleotides of any length, either ribonucleotides or deoxynucleotides, and 
includes both double- and single-stranded DNA and RNA. A polynucleotide may 

30 include nucleotide sequences having different functions, including for instance 
coding sequences, and non-coding sequences such as regulatory sequences and/or 
non-translated regions. A polynucleotide can be obtained directly from a natural 
source, or can be prepared with the aid of recombinant, enzymatic, or chemical 
techniques. A polynucleotide can be linear or circular in topology and can be, for 
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example, a portion of a vector, such as an expression or cloning vector, or a 
fragment. 

The terms "coding region" and "coding sequence" are used 
interchangeably and refer to a polynucleotide region that encodes a polypeptide 
5 and, when placed under the control of appropriate regulatory sequences, expresses 
the encoded polypeptide. The boundaries of a coding region are generally 
determined by a translation start codon at its 5' end and a translation stop codon at 
its 3' end. A coding region can encode one or more polypeptides. For instance, a 
coding region can encode a polypeptide that is subsequently processed into two or 

10 more polypeptides. A regulatory sequence or regulatory region is a nucleotide 
sequence that regulates expression of a coding region to which it is operably 
linked. Nonlimiting examples of regulatory sequences include promoters, 
transcription initiation sites, translation start sites, internal ribosome entry sites, 
translation stop sites, and terminators. "Operably linked" refers to a juxtaposition 

15 wherein the components so described are in a relationship permitting them to 

function in their intended manner. A regulatory sequence is "operably linked" to a 
coding region when it is joined in such a way that expression of the coding region 
is achieved under conditions compatible with the regulatory sequence. 

"Polypeptide" as used herein refers to a polymer of amino acids and does 

20 not refer to a specific length of a polymer of amino acids. Thus, for example, the 
terms peptide, oligopeptide, protein, polyprotein, proteinase, and enzyme are 
included within the definition of polypeptide. This term also includes post- 
expression modifications of the polypeptide, for example, glycosylations, 
acetylations, phosphorylations and the like. A "hepatitis C virus polyprotein" 

25 refers to a polypeptide that is post-translationally cleaved to yield more than one 
polypeptide. 

The terms "5' non-translated RNA," "5* non-translated region," "5' 
untranslated region" and "5'noncoding region" are used interchangeably, and are 
terms of art (see Bukh et al., Proc. Nat. Acad. Sci. USA, 89, 4942-4946 (1992)). 
30 The term refers to the nucleotides that are at the 5' end of a replication competent 
polynucleotide. 

The terms M 3* non-translated RNA," "3* non-translated region," and "3 f 
untranslated region" are used interchangeably, and are terms of art. The term 

6 
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refers to the nucleotides that are at the 3' end of a replication competent 
polynucleotide. 

Unless otherwise specified, "a," "an," "the," and "at least one 1 ' are used 
interchangeably and mean one or more than one. 

5 

BRIEF DESCRIPTION OF THE FIGURES 

Figure. 1 . Organization of the selectable subgenomic dicistronic HCV 
replicons, Bpp-Ntat2ANeo/SI (identical to Ntat2Aneo/SI in Yi et al., Virol., 302, 
197-210 (2002)), Htat2ANeo/ST, and Bpp-Htat2ANeo/SI, in which most of the 

10 nonstructural protein-coding region and the 3'NTR are derived from the H77c 
HCV genotype la sequence. The two large ORFs are shown as rectangles, with 
nontranslated RNA segments shown as lines. The segment of the 3' ORF labeled 
'pp' ('proximal protease') encodes the amino terminus of the NS3 protein (residues 
1 to 75). 'Bpp' indicates that this region is derived from the HCV Conl sequence. 

15 Both replicons contain the S2204I mutation in NS5A (S-+I). '5' Indicates the 
hepatitis delta ribozyme sequence introduced downstream of the 3' terminus of 
the HCV sequence that produces an exact 3' end. 



Figure. 2. Transient HCV RNA replication assay. Shown is the expression 
20 of SEAP by En5-3 cells following transfection with the chimeric la replicon Bpp- 
Htat2ANeo/SI and Bpp-Htat2ANeo/KR/Sl, which carries an additional K1691R 
mutation in NS3 that was identified following selection of G418-resistant cells 
following transfection with Bpp-Htat2ANeo/SI. As controls, SEAP expression is 
shown following transfection of cells with the highly replication competent lb 
25 replicon, Bpp-Ntat2ANeo/SI, and a related replication defective AGDD mutant; 
also shown in SEAP expression by normal En5-3 cells. Results shown represent 
the mean values obtained from triplicate cultures transfected with each RNA. SI, 
S2204 adaptive mutation; KR, K1691R adaptive mutation. 

30 Figure. 3. (A) Schematic depicting the organization of the 5' end of the 

second ORF in subgenomic chimeric replicons containing most (Bpp-H34A- 
Ntat2ANeo/SI) or all (Hpp-H34A-Ntat2ANeo/SI) of the H77 genotype la 
NS34A-coding sequence in the background of the genotype lb Bpp- 
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Ntat2ANeo/SI. Genotype la sequence (H77) is shown as an open box, genotype 
lb sequence (Conl or HCV-N) as a shaded box. 'Bpp' indicates the presence of 
genotype lb sequence from the Con J strain of HCV in the 5' proximaJ protease 
coding sequence, whereas 'Hpp' indicates that this sequence is derived from the 
5 genotype la H77 sequence. Approximate locations are shown for the adaptive 
mutations Q1067R (Q^R) and Gl 188R (G-*R), identified in G418-resistant cell 
clones selected following transfecttion of Hpp-H34A-Ntat2ANeo/SI. (B) SEAP 
activity present in supernatant culture fluids collected at 24 hr intervals following 
transfection of En5-3 cells with various chimeric la-lb replicons including Bpp- 
H34A-Ntat2ANeo/SI, Hpp-H34A-Ntat2ANeo/SI, Hpp-H34A-Ntat2ANeo/QR/SI, 
and Hpp-H34A-Ntat2ANeo/GR/SI. Control cells were transfected with Bpp- 
Ntat2ANeo/SI and the replication defective AGDD mutant. See legend to Fig. 2 
for further details. SI, S2204 adaptive mutation; QR, Q1067R adaptive mutation; 
and GR, Gl 188R adaptive mutation. 

Figure 4. Impact of adaptive mutations on replication competence of the 
subgenomic genotype la replicon, Htat2ANeo/SJ. (A) Location of various 
adaptive mutations within the second ORF (derived entirely from the genotype la 
H77sequence): Q1067R, P1496L (NS3); K1691R (NS4A); and F2080V and 
S2204I (NS5A). (B) Transient HCV RNA replication assay. SEAP activity in 
culture supernatants collected at 12-24 hr intervals following electroporation of 
En5-3 cells with the la replicon Htat2ANeo carrying the indicated combinations 
of the adaptive mutations shown in panel A. Cells were also transfected with 
genotype 1 b Bpp-Ntat2 ANeo/SI replicon RNA as a reference. (C) Summary of the 
replication phenotypes of genotype la replicon Htat2ANeo RNAs containing 
various combinations of adaptive mutations: (-) no detectable replication, (+) 
modest increase in SEAP expression above background days 3-5, and (+++) > 10- 
fold increase in SEAP expression above background 7 days after transfection in 
the transient replication assay (see panel B). SI, S2204 adaptive mutation; QR, 
Q1067R adaptive mutation; PL, P1496L adaptive mutation; KR, K 1 69 1R adaptive 
mutation; and FV, F2080V adaptive mutation. 

Figure. 5. Adaptive mutations within the polyprotein do not influence the 
efficiency of polyprotein translation under control of the EMCV IRES. Shown is 
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an SDS-PAGE gel loaded with products of in vitro translation reactions 
programmed with RNAs derived from Bpp-Ntat2ANeo (lanel), Bpp-Htat2ANeo 
(lanes 2 and 3), Htat2ANeo (lanes 4 to 8), or Bpp-Ntat2 ANeo/AGDD (lane 9) 
RNAs carrying various combinations of adaptive mutations (Q1067R, K169JR, 

5 F2080V, or S2204I) as indicated. The schematic at the top of the figure indicates 
the location of these mutations within the polyprotein. 'pp' indicates the RNA 
segment encoding the amino terminal 75 residues of NS3, while IMS' indicates the 
remainder of the RNA segment encoding the nonstructural proteins. H = genotype 
la H77 sequences, B = genotype lb Conl sequences, and N = genotype lb HCV- 

10 N sequences. Location of NS3 and Neo product is indicated at the side of gel. 

Figure 6. Impact of additional adaptive mutations on replication 
competence of the subgenomic genotype la replicon, Htat2ANeo/QR/KR/SI (see 
Fig. 4). (A) Location of various adaptive mutations within the second ORF 

15 (derived entirely from the genotype la H77sequence): Q1067R, VI 6551 (NS3); 
K1691R (NS4A); and K2040R (KR 5A ), F2080V and S2204I (NS5A). (B) 
Transient HCV RNA replication assay. SEAP activity in culture supernatants 
collected at 12-24 hr intervals following electroporation of En5-3 cells with the la 
replicon Htat2ANeo carrying the indicated combinations of the adaptive 

20 mutations shown in panel A. Cells were also transfected with genotype lb Bpp- 
Ntat2ANeo/SI replicon RNA as a reference. QR, Q1067R adaptive mutation; VI, 
V1655I adaptive mutation; KR, K1691R adaptive mutation; KR 5A , K2040 
adaptive mutation; FV, F2080V adaptive mutation; and SI, S2204I adaptive 
mutation. 

25 

Figure 7. Northern analysis of HCV RNA abundance 4 days following 
transfection of normal Huh7 or En5-3 cells with the indicated dicistronic 
subgenomic and monocistronic genome length HCV RNAs: (lane 1), normal cells; 
(lane 2), the subgenomic replicon, Htat2ANeo/SI; (lanes 2-5), Htat2ANeo/SI 
30 replicon RNAs carrying the indicated combinations of mutations; (lane 6), 

nonreplicating Htat2ANeo/QR/VI/KR//KR5A/SI/AAG; (lanes 7) genome-length 
H77c RNA; (lanes 8-10), genome-length H77c RNA containing the indicated 
combinations of mutations; (lane 1 1), genome-length H77 RNA containing the 
lethal NS5B mutation; (lanes 12 and 13) subcontrol genomic and genome-length 
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synthetic RNA transcripts. Blots were probed with a genotype la probe derived 
from the NS5B coding sequence for detection of HCV-specific sequence (top 
panels); blots were also probed for b-actin message to assess RNA loading (lower 
panels). At the top of the figure is shown the En5-3 cell culture supernatant fluid 
5 SEAP activity induced by replicating subgenomic RNAs at the time of cell 
harvest. SI, S2204 adaptive mutation; QR, Q1067R adaptive mutation; KR, 
K1691R adaptive mutation; and FV, F2080V adaptive mutation. 

Figure 8. Structure of the NS3/4A serine protease/helicase enzyme 

10 complex derived from the genotype lb BK strain of HCV (PDP 1CU1), with the 
locations of adaptive mutations highlighted. (A) Wire diagram of structure 
showing the NS3 helicase domain (H) and the protease domain (P). The NS4A 
cofactor polypeptide (NS4A) is shown in space-filling view, with the NS3 
protease active site residues (Active Site) shown in space-filling view. Adaptive 

15 mutations identified in this study (Q1067, G1188, VI 655, and K1691) cluster near 
the protease active site or at sites involved in substrate recognition, including the 
mutations in the NS3 protease domain at Gin- 1067, Gly-1 188 and near the 
carboxyl terminus of NS3 in the helicase domain at Val-1655. The NS4A adaptive 
mutation at Lys-1691 is just beyond the surface of the protease, at the site of exit 

20 of the NS4A strand. Adaptive mutations within the NS3 helicase domain that were 
identified in other studies, S I 222, A1226, and P1496 are shown in space-filling 
view, and are not close to the protease active site. (B) Space-filling view of the 
structure shown in panel A, in which the adaptive mutations and active site have 
similar shading. The NS3/4A adaptive mutations identified in this study 

25 (Q1067R, Gl 188R, V1655I, and K1691R) all occur at solvent accessible residues 
on this side of the molecule. (C) Flip-view of the structure shown in panel B, 
rotated approximately 180 degrees. The helicase adaptive mutations identified in 
previous studies are located on the surface of the helicase, distant from the 
protease active site. Note that in the sequence of the genotype lb BK strain of 

30 HCV, Pro-1496 is Arg (referred to as P1496(R) in the figure, and Lys-1691 is Ser 
(referred to as K1691(S) in the figure). 
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Figure 9. Nucleotide sequence of HIVSEAP (SEQ ID NO:7). The HIV 
long terminal repeat (LTR) is depicted at nucleotides 1-719, and secretory alkaline 
phosphatase is encoded by the nucleotides 748-2239. 

5 Figure 10. I0A, nucleotide sequence of a 3'NTR (SEQ ID NO:8); 10B, 

nucleotide sequence of a 5'NTR (SEQ ID NO:9). 

Figure 11. 1 1 A, nucleotide sequence of a genomic length (full length) 
hepatitis C virus, genotype la (SEQ ID NO: 1 1); 1 IB, the amino acid sequence of 
10 the HCV polyprotein (SEQ ID NO: 12) encoded by the coding region present in 
SEQ ID NO: 11. 

Figure 12. 1 2 A, nucleotide sequence of Htat2ANeo (SEQ ID NO: 1 3), 
where nucleotide 1-341 are the 5NTR, nucleotides 342-1454 are the tat2ANeo 

15 (termination codon at 1455-1457), nucleotides 1458-2076 are the EMCV IRES, 
nucleotides 2080-8034 encode the HCV polyprotein (initiation codon at 
nucleotides 2077-2079 and termination codon at nucleotides 8035-8037), 
nucleotides 8038-8259 are the 3'NTR, and nucleotides 8260-8345 are the HDV 
delta ribozyme (plasmid vector sequences are shown at nucleotides 8346-1 1240); 

20 12B, the amino acid sequence of the HCV polyprotein (SEQ ID NO: 14) encoded 
by the coding region present in SEQ ID NO: 13. 

Figure 13. Nucleotide (SEQ ID NO:l) of Hepatitis C virus strain H77 and 
amino acid sequence (SEQ ID NO:2) encoded by nucleotides 342 - 9377. 

25 

Figure 14. Nucleotide (SEQ ID NO:3) of Hepatitis C virus strain H and 
amino acid sequence (SEQ ID NO:4) encoded by nucleotides 342 - 9377. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE 
30 INVENTION 

The present invention provides replication competent polynucleotides. 
The polynucleotides include a 5' non-translated region (NTR), a 3'NTR, and a 
coding sequence present between the 5'NTR and 3'NTR. The replication 
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competent polynucleotides of the present invention are based on hepatitis C virus 
(HCV), a positive-strand virus. While the ability of a polynucleotide to replicate 
typically requires the presence of the positive-strand RNA polynucleotide in a 
cell, it is understood that the term "replication competent polynucleotide" also 
5 includes the complement thereof (i.e., the negative-sense RNA), and the 

corresponding DNA sequences of the positive-sense and the negative-sense RNA 
sequences. Optionally, a replication competent polynucleotide may be isolated. 
"Isolated" means a biological material, for instance a polynucleotide, polypeptide, 
or virus particle, that has been removed from its natural environment. For 

10 instance, a virus that has been removed from an animal or from cultured cells in 
which the virus was propagated is an isolated virus. An isolated polypeptide or 
polynucleotide means a polypeptide or polynucleotide that has been either 
removed from its natural environment, produced using recombinant techniques, or 
chemically or enzymatically synthesized. A "purified" biological material is one 

15 that is at least 60% free, preferably 75% free, and most preferably 90% free from 
other components with which it is naturally associated. 

The coding sequence encodes a hepatitis C virus polyprotein. In some 
aspects of the invention, the HCV polyprotein can yield the following 
polypeptides; core (also referred to as C or nucleocapsid), El, E2, P7, NS2, NS3, 

20 NS4A, NS4B, NS5A, and NS5B. Optionally, a full length HCV polyprotein also 
yields protein F (see Xu et al., EMBO J., 20, 3840-3848 (2001). In some aspects 
of the present invention, an HCV polyprotein is shortened and yields a subset of 
polypeptides, and typically does not include polypeptides encoded by the amino 
terminal end of the full length HCV polyprotein. Thus, a hepatitis C virus 

25 polyprotein may encode the polypeptides El, E2, P7, NS2, NS3, NS4A, NS4B, 
NS5A, and NS5B; E2, P7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B; P7, NS2, 
NS3, NS4A, NS4B, NS5A, and NS5B; NS2, NS3, NS4A, NS4B, NS5A, and 
NS5B; or NS3, NS4A, NS4B, NS5A, and NS5B. The hepatitis C virus encoding 
such a shortened HCV polyprotein may be referred to as a subgenomic hepatitis C 

30 virus, and the shortened HCV polyprotein may be referred to as a subgenomic 
HCV polyprotein. In other aspects of the invention, a replication competent 
polynucleotide encodes an HCV polyprotein that does not include polypeptides 
present in an internal portion of a hepatitis C virus polyprotein. Thus, a 
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subgenomic hepatitis C virus polyprotein may encode, for instance, the 
polypeptides NS3, NS4A, NS4B, and NS5B. 

In those aspects of the invention where the replication competent 
polynucleotide includes a coding region that encodes less than a full length HCV 

5 polyprotein, the 5' end of the coding region encoding the HCV polyprotein may 
further include about 33 to about 51 nucleotides, or about 36 to about 48 
nucleotides, that encode the first about 1 1 to about 17, or about 12 to about 16, 
amino acids of the core polypeptide. The result is a fusion polypeptide made up 
of amino terminal amino acids of the core polypeptide and the first polypeptide 

10 encoded by the first cleavage product of the polyprotein, e.g., El , or E2, or P7, or 
NS2, etc. 

A polyprotein that can yield the core, El, E2, P7, NS2, NS3, NS4A, 
NS4B, NS5A, and NS5B polypeptides (a full length polyprotein) is typically 
between about 3000 and 3033 amino acids in length, preferably about 30 1 1 amino 

15 acids in length. The relationship between such a polyprotein and the 

corresponding residues of the individual polypeptides resulting after post- 
translational processing is shown in Table 1 . This numbering system is used 
herein when referring to a full length polyprotein, and when referring to a 
polyprotein that contains a portion of the full length polyprotein. For instance, in 

20 those aspects of the invention where the replication competent polynucleotide 

includes a coding sequence encoding an HCV polyprotein that yields the cleavage 
products NS3, NS4A, NS4B, NS5A, and NS5B and there is no fusion polypeptide 
made up of amino terminal amino acids of the core polypeptide and the cleavage 
product NS3, the first amino acid of the NS3 polypeptide is considered to be about 

25 residue number 1027. A person of ordinary skill in the art recognizes that this 
numbering system can vary between members of different genotypes, and 
between members of the same genotype, thus the numbers shown in Table 1 are 
approximate, and can vary by 1, 2, 3, 4, or about 5. 
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Table 1 . Correspondence between amino acids of polyprotein and individual 



Amino acids of HCV 
polyprotein 0 


Corresponding polypeptide after 
processing 


1-191 


Core 


192-383 


El 


384-746 


E2 


747-809 


P7 


810-1026 


NS2 


1027-1657 


NS3 


1658-1711 


NS4A 


1712-1972 


NS4B 


1973-2420 


NS5A 


2421-3011 


NS5B 



a Refers to the approximate amino acid number prior to cJeavage of the 
polyprotein where the first amino acid is the first amino acid of the polyprotein 
expressed by the HCV at Genbank Accession number AF01 1751 and Genbank 
Accession number M67463. 



A replication competent polynucleotide of the present invention includes 
at least one adaptive mutation. As used herein, an adaptive mutation is a change 

10 in the amino acid sequence of the polyprotein that increases the ability of a 
replication competent polynucleotide to replicate compared to a replication 
competent polynucleotide that does not have the adaptive mutation. One adaptive 
mutation that a replication competent polynucleotide of the present invention 
typically includes is an isoleucine at about amino acid 2204, which is about amino 

15 acid 232 of NS5A. Most clinical HCV isolates and molecularly cloned laboratory 
HCV strains include a serine at this position, and this mutation has been referred 
to in the art as S2204I. In most replication competent polynucleotides, the 
location of this adaptive mutation can also be determined by locating the amino 
acid sequence SSSA beginning at about amino acid 2200 in the HCV polyprotein, 

20 where the amino acid immediately following the SSSA sequence is isoleucine. 

A replication competent polynucleotide of the present invention may also 
include one or more of the adaptive mutations described herein, or a combination 
thereof. The first such adaptive mutation is an arginine at about amino acid 1067, 
which is about amino acid 41 of NS3. Most clinical HCV isolates and 

25 molecularly cloned laboratory HCV strains include a glutamine at this position, 
thus this mutation can be referred to as Ql 067R. In most replication competent 
polynucleotides, the location of this adaptive mutation can also be determined by 
locating the amino acid sequence STAT beginning at about amino acid 1063 in 
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the HCV polyprotein, where the amino acid immediately following the STAT 
sequence is arginine. The second adaptive mutation is an arginine at about amino 
acid 1691, which is about amino acid 34 of NS4A. Most clinical HCV isolates 
and molecularly cloned laboratory HCV strains include a lysine at this position, 
5 thus this mutation can be referred to as K 1691 R. In most replication competent 
polynucleotides, the location of this adaptive mutation can also be determined by 
locating the amino acid sequence VLSG beginning at about amino acid 1687 in 
the HCV polyprotein, where the amino acid immediately following the VLSG 
sequence is arginine. The third adaptive mutation is a valine at about amino acid 

10 2080, which is about amino acid 108 of NS5A. Most clinical HCV isolates and 
molecularly cloned laboratory HCV strains include a phenylalanine at this 
position, thus this mutation can be referred to as F2080V. In most replication 
competent polynucleotides, the location of this adaptive mutation can also be 
determined by locating the amino acid sequence ALWR beginning at about amino 

15 acid 2081 in the HCV polyprotein, where the amino acid immediately before the 
ALWR sequence is valine. A fourth adaptive mutation is an isoleucine at about 
amino acid 1655, which is about amino acid 629 of NS3. Most clinical HCV 
isolates and molecularly cloned laboratory HCV strains include a valine at this 
position, thus this mutation can be referred to as VI 6551. In most replication 

20 competent polynucleotides, the location of this adaptive mutation can also be 

determined by locating the amino acid sequence ADLE beginning at about amino 
acid 2051 in the HCV polyprotein, where the amino acid immediately after the 
ADLE sequence is isoleucine. A fifth adaptive mutation is an arginine at about 
amino acid 2040, which is about amino acid 68 of NS5A. Most clinical HCV 

25 isolates and molecularly cloned laboratory HCV strains include a lysine at this 
position, thus this mutation can be referred to as K2040R. In most replication 
competent polynucleotides, the location of this adaptive mutation can also be 
determined by locating the amino acid sequence GHVXN beginning at about 
amino acid 2037 in the HCV polyprotein, where the X in the amino acid is 

30 arginine. A sixth adaptive mutation is an arginine at about amino acid 1 188, 
which is about amino acid 162 of NS3. Most clinical HCV isolates and 
molecularly cloned laboratory HCV strains include a glycine at this position, thus 
this mutation can be referred to as Gl 188R. In most replication competent 
polynucleotides, the location of this adaptive mutation can also be determined by 
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locating the amino acid sequence VCTR beginning at about amino acid 1 184 in 
the HCV polyprotein. In some aspects, the replication competent polynucleotide 
of the present invention includes the Q1067R and K1691R adaptive mutations, as 
well as the S2204I adaptive mutation. These adaptive mutations are summarized 
5 in Table 2. A person of ordinary skill in the art recognizes that the precise 
location of these cell culture adaptive mutations can vary between members of 
different genotypes, and between members of the same genotype, thus the 
numbers shown in Table 2 are approximate, and can vary by 1,2, 3, 4, or about 5. 

10 Table 2. Adaptive Mutations 



Symbol 1 


Protein / Residue 2 


Mutation 3 


QR 


NS3/41 


Q1067R 


GR 


NS3/162 


G1188R 


VI 


NS3/629 


VI 6551 


KR 


NS4A/34 


K1691R 


KR 5A 


NS5A/68 


K2040R 


FV 


NS5A/108 


F2080V 


SI 


NS5A/232 


S2204I 



1 Symbol used to designate presence in RNA transcripts, 
2 Residue refers to position in protein after post-translational cleavage of 
the H77c polyprotein (GenBank accession AF01 1751). 

3 Number refers to position of mutation in H77c polyprotein before post- 
15 translational cleavage (GenBank accession AF01 1751). 

There are many other adaptive mutations known to the art, and the 
replication competent polynucleotides of the present invention may include one or 
more of those adaptive mutations. Examples of known adaptive mutations can be 
20 found in, for instance, Bartenschlager (U.S. Patent 6,630,343), Blight et al. 
(Science, 290, 1972-1975 (2000)), Lohmann et al., (Abstract P038, 7th 
International Meeting on Hepatitis C virus and Related viruses (Molecular 
Virology and Pathogenesis), December 3-7 (2000)), Guo et al. (Abstract P045, 7th 
Internationa) Meeting on Hepatitis C virus and Related viruses (Molecular 
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Virology and Pathogenesis), December 3-7 (2000)), Blight et ah, (J. Virol. 77, 
3181-3190 (2003)), Gu et ah, (J. Virol. 77, 5352-5359 (2003)), and Grobler et al. 
,(J. Biol. Chem., 278,16741-16746 (Feb, 2003). 

It is expected that polynucleotides encoding an HCV polyprotein can be 
5 obtained from different sources, including molecularly cloned laboratory strains, 
for instance cDNA clones of HCV, and clinical isolates. Examples of molecularly 
cloned laboratory strains include the HCV that is encoded by pCV-H77C (Yanagi 
et ah, Proc. Natl Acad, ScL USA, 94, 8738-8743 (1997), Genbank accession 
number AF01 1751, SEQ ID NO:l), and pHCV-H (Inchauspe et al., Proc. Natl. 

10 Acad. ScL USA, 88, 10292-10296 (1991), Genbank accession number M67463, 
SEQ ID NO:3): Clinical isolates can be from a source of infectious HCV, 
including tissue samples, for instance from blood, plasma, serum, liver biopsy, or 
leukocytes, from an infected animal, including a human or a primate. It is also 
expected that the polynucleotide encoding the HCV polyprotein present in a 

15 replication competent polynucleotide can be prepared by recombinant, enzymatic, 
or chemical techniques. The nucleotide sequence of molecularly cloned 
laboratory strains and clinical isolates can be modified to encode an HCV 
polyprotein that includes the S2204I adaptive mutation and one or more of the 
adaptive mutations described herein. Such methods are routine and known to the 

20 art and include, for instance, PCR mutagenesis. 

The present invention further includes replication competent 
polynucleotides encoding an HCV polyprotein having similarity with the amino 
acid sequence of SEQ ID NO:2, SEQ ID NO:4 (in the case of a full length 
polyprotein), or a portion thereof (in the case of an HCV polyprotein encoding, for 

25 instance, NS3, NS4A, NS4B, NS5A, and NS5B, and not encoding core, El, E2, 
P7, and NS2). The similarity is referred to as structural similarity and is generally 
determined by aligning the residues of the two amino acid sequences (i.e., a 
candidate amino acid sequence and the amino acid sequence of SEQ ID NO:2, 
SEQ ID NO:4, or a portion thereof) to optimize the number of identical amino 

30 acids along the lengths of their sequences; gaps in either or both sequences are 
permitted in making the alignment in order to optimize the number of identical 
amino acids, although the amino acids in each sequence must nonetheless remain 
in their proper order. A candidate amino acid sequence is the amino acid 
sequence being compared to an amino acid sequence present in SEQ ID NO:2, 
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SEQ ID NO:4, or a portion thereof. A candidate amino acid sequence can be 
isolated from a cell infected with a hepatitis C virus, or can be produced using 
recombinant techniques, or chemically or enzymatically synthesized. Preferably, 
two amino acid sequences are compared using the Blastp program of the BLAST 
5 2 search algorithm, as described by Tatusova, et al. (FEMS Microbiol Lett 1 999, 
174:247-250), and available at http://www.ncbi.nlm.nih.gov/gorf/bl2.html. 
Preferably, the default values for all BLAST 2 search parameters are used, 
including matrix = BLOSUM62; open gap penalty =11, extension gap penalty = 
1 , gap x_dropoff = 50, expect = 10, wordsize = 3, and optionally, filter on. In the 
0 comparison of two amino acid sequences using the BLAST search algorithm, 
structural similarity is referred to as "identities." An HCV polyprotein may 
include an amino acid sequence having a structural similarity with SEQ ID NO:2, 
SEQ ID NO:4, or a portion thereof, of at least about 90 %, for example 91%, 92%, 
93% identity, and so on to 100 % identity. A replication competent 
5 polynucleotide having a 5 s NTR of SEQ ID NO:9, a 3' NTR of SEQ ID NO:8, and 
HCV polyprotein with structural similarity with SEQ ID NO:2, SEQ ID NO:4, or 
a portion thereof, is replication competent in a cell derived from a human 
hepatoma such as Huh-7 and Huh-7.5. An HCV polyprotein having structural 
similarity with the amino acid sequence of SEQ ID NO:2, SEQ ID NO:4, or a 
portion thereof, includes the S2204I adaptive mutation and one or more of the 
adaptive mutations described herein. Such an HCV polyprotein may optionally 
include other adaptive mutations. 

In some aspects, the coding sequence of a replication competent 
polynucleotide of the present invention that encodes a hepatitis C virus 
polyprotein is not a specific genotype. For instance, a polynucleotide encoding an 
HCV polyprotein present in a replication competent polynucleotide of the present 
invention can be genotype la, lb, lc, 2a, 2b, 2c, 3a, 3b, 4, 5a, or 6a (as defined by 
Simmonds, Hepatology, 21, 570-583 (1995)). In other aspects, the HCV 
polyprotein is genotype la. Methods for determining the genotype of a hepatitis C 
virus are routine and known to the art and include, for instance, serotyping the 
virus particle using antibody, and/or evaluation of the nucleotide sequence by, for 
instance, polymerase chain reaction assays (see Simmonds, 7. Hepatol., 31(Suppl. 
1), 54-60(1999)). 
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The present invention includes polynucleotides encoding an amino acid 
sequence having similarity to an HCV polyprotein. The similarity is referred to as 
structural similarity and is determined by aligning the residues of two 
polynucleotides (e.g., the nucleotide sequence of the candidate coding region and 
5 nucleotides 342 - 9377 of SEQ ID NO: 1 or nucleotides 342 - 9377 of SEQ ID 
NO:3) to optimize the number of identical nucleotides along the lengths of their 
sequences; gaps in either or both sequences are permitted in making the alignment 
in order to optimize the number of shared nucleotides, although the nucleotides in 
each sequence must nonetheless remain in their proper order. A candidate coding 

10 region is the coding region being compared to a coding region present in SEQ ID 
NO:i (e.g., nucleotides 342 - 9377 of SEQ ID NO:l). A candidate nucleotide 
sequence can be isolated from a cell, or can be produced using recombinant 
techniques, or chemically or enzymatically synthesized. Preferably, two 
nucleotide sequences are compared using the Blastn program of the BLAST 2 

15 search algorithm, as described by Tatusova, et al. (FEMS Microbiol Lett 1999, 
174:247-250), and available at http://www.ncbi.nIm.nih.gov/gorf/bl2.htmJ. 
Preferably, the default values for all BLAST 2 search parameters are used, 
including reward for match = 1, penalty for mismatch = -2, open gap penalty = 5, 
extension gap penalty = 2, gap x_dropoff = 50, expect = 10, wordsize =11, and 

20 optionally, filter on. In the comparison of two nucleotide sequences using the 
BLAST search algorithm, structural similarity is referred to as "identities." 

The present invention also includes polynucleotides encoding the HCV 
polyproteins described herein, including, for instance, the polyproteins having the 
amino acid sequence shown in SEQ ID NO:2 and SEQ ID NO:4. An example of 

25 the class of nucleotide sequences encoding each of these polyproteins are 

nucleotides 342 - 9377 of SEQ ID NO:l and nucleotides 342 - 9377 of SEQ ID 
NO:3, respectively. These classes of nucleotide sequences are large but finite, and 
the nucleotide sequence of each member of the class can be readily determined by 
one skilled in the art by reference to the standard genetic code. 

30 A replication competent polynucleotide of the present invention includes a 

5' non-translated region (NTR) (see Smith et ah, 7. Gen. Virol, 76, 1749-1761 
(1995)). A 5' NTR is typically about 341 nucleotides in length. A replication 
competent polynucleotide of the present invention also includes a 3' NTR. A 3' 
NTR typically includes, from 5' to 3', nucleotides of variable length and sequence 
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(referred to as the variable region), a poly-pyrimidine tract (the poly U-UC 
region), and a highly conserved sequence of about 100 nucleotides (the conserved 
region) (see, for instance, Lemon et al., U.S. Published Application US 2003 
0125541, and Yi and Lemon, J. Virol., 77, 3557-3568 (2003)). The variable 
5 region begins at about the first nucleotide following the stop codon of the HCV 
polyprotein, and generally ends immediately before the nucleotides of the poly U- 
UC region. The poly U-UC region is a stretch of predominantly U residues, CU 
residues, or C(U) n -repeats. When the nucleotide sequence of a variable region is 
compared between members of the same genotype, there is typically a great deal 

10 of similarity; however, there is typically very little similarity in the nucleotide 

sequence of the variable regions between members of different genotypes (see, for 
instance, Yamada et al., Virology, 223, 255-261 (1996)). 

It is expected that a 5'NTR and a 3'NTR can be obtained from different 
sources, including molecularly cloned laboratory strains, for instance cDNA 

15 clones of HCV, and clinical isolates. Examples of molecularly cloned laboratory 
strains include the HCV that is encoded by pCV-H77C (Yanagi et al., Proc. Natl 
Acad. ScL USA y 94, 8738-8743 (1997), Genbank accession number AF01 1751, 
SEQ ID NO: 1, where nucleotides 1-341 are the 5'NTR and nucleotides 9378- 
9599 are the 3'NTR), and pHCV-H (Inchauspe et al., Proc. Natl. Acad. ScL USA, 

20 88, 10292-10296 (1991), Genbank accession number M67463, SEQ ID NO:3, 
where nucleotides 1-341 are the 5'NTR and nucleotides 9378-9416 are the 3' 
NTR). Clinical isolates can be from a source of infectious HCV, including tissue 
samples, for instance from blood, plasma, serum, liver biopsy, or leukocytes, from 
an infected animal, including a human or a primate. It is also expected that the 

25 polynucleotide encoding the HCV polyprotein present in a replication competent 
polynucleotide can be prepared by recombinant, enzymatic, or chemical 
techniques. 

In some aspects, a 5'NTR and a 3'NTR of a replication competent 
polynucleotide of the present invention is not a specific genotype. For instance, a 
30 5'NTR and a 3'NTR present in a replication competent polynucleotide of the 
present invention can be genotype la, lb, lc, 2a, 2b, 2c, 3a, 3b, 4, 5a, or 6a (as 
defined by Simmons, Hepatology, 21, 570-583 (1995)). In other aspects, the 
HCV polyprotein is genotype la. Methods for determining the genotype of a 5' 
NTR and a 3'NTR are routine and known to the art and include evaluation of the 
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nucleotide sequence for specific nucleotides that are characteristic of a specific 
genotype. 

In some aspects of the invention a replication competent polynucleotide 
includes a second coding region. The second coding sequence may be present in 
5 the 3'NTR, for instance, in the variable region of the 3'NTR. In some aspects of 
the invention, the second coding region is present in the variable region such that 
the variable region is not removed. Alternatively, the second coding region 
replaces the variable region in whole or in part. In some aspects of the invention, 
for instance, when the HCV has the genotype la, the second coding region is 

10 inserted in the variable region between nucleotides 5 and 6 of the sequence 5' 
CUCUUAAGC 3', where the sequence shown corresponds to the positive-strand. 

In some aspects of the invention, the second coding region is present in a 
replication competent polynucleotide downstream of the 5'NTR, and upstream of 
the first coding region, i.e., the coding region encoding a HCV polyprotein. For 

15 instance, the first nucleotide of the second coding region may be immediately 
downstream and adjacent to the last nucleotide of the 5' NTR. Alternatively, the 
first nucleotide of the second coding region may be further downstream of the last 
nucleotide of the 5'NTR, for instance, about 2 to about 51 nucleotides, about 33 to 
about 51 nucleotides, or about 36 to about 48 nucleotides downstream of the last 

20 nucleotide of the 5'NTR. Typically, when the first nucleotide of the second 

coding region is not immediately downstream of the last nucleotide of the 5'NTR, 
the nucleotides in between the 5'NTR and the second coding region encode the 
amino terminal amino acids of the HCV core polypeptide. For instance, the 5' end 
of the second coding region may further include about 33 to about 51 nucleotides, 

25 or about 36 to about 48 nucleotides, that encode the first about 1 1 to about 17, or 
about 12 to about 16, amino acids of the core polypeptide. The result is a fusion 
polypeptide made up of amino terminal amino acids of the core polypeptide and 
the polypeptide encoded by the second coding region (see, for instance, Yi et al., 
Virol., 304, 197-210 (2002), and U.S. Published Application US 2003 0125541). 

30 Without intending to be limiting, it is believed the presence of the nucleotiodes 
from the core coding sequence act to enhance translation the polypeptide encoded 
by the second coding region. 

In those aspects of the invention where the second coding region present in 
a replication competent polynucleotide is present downstream of the 5'NTR and 
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upstream of the coding region encoding the HCV polyprotein, the replication 
competent polynucleotide typically includes a regulatory region operably linked to 
the downstream coding region, e.g., the coding region encoding the HCV 
polyprotein. Preferably, the regulatory region provides for the translation of the 
5 downstream coding region. The size of the regulatory region may be from about 
400 nucleotides to about 800 nucleotide, more preferably, about 600 nucleotides 
to about 700 nucleotides. Typically, the regulatory region is an IRES. Examples 
of IRES elements are described herein. 

The second coding region can encode a polypeptide including, for 

10 instance, a marker, including a detectable marker and/or a selectable marker. 

Examples of detectable markers include molecules having a detectable enzymatic 
activity, for instance, secretory alkaline phosphatase, molecules having a 
detectable fluorescence, for instance, green or red or blue fluorescent protein, and 
molecules that can be detected by antibody. Examples of selectable markers 

15 include molecules that confer resistance to antibiotics able to inhibit the 

replication of eukaryotic cells, including the antibiotics kanamycin, ampicillin, 
chloramphinicol, tetracycline, blasticidin, neomycin, and formulations of 
phleomycin Dl including, for example, the formulation available under the trade- 
name ZEOCIN (Invitrogen, Carlsbad, California). Coding sequences encoding 

20 such markers are known to the art. Other examples of polypeptides that can be 
encoded by the second coding region include a transactivator, and/or a fusion 
polypeptide. Preferably, when the polypeptide is a fusion polypeptide, the second 
coding region includes nucleotides encoding a marker, more preferably, 
nucleotides encoding a fusion between a transactivator and a marker. 

25 Transactivators are described herein below. Optionally, the coding region can 
encode an immunogenic polypeptide. A replication competent polynucleotide 
containing a second coding region is typically dicistronic, i.e., the coding region 
encoding the HCV polyprotein and the second coding region are separate. 

An "immunogenic polypeptide" refers to a polypeptide which elicits an 

30 immunological response in an animal. An immunological response to a 

polypeptide is the development in a subject of a cellular and/or antibody-mediated 
immune response to the polypeptide. Usually, an immunological response 
includes but is not limited to one or more of the following effects: the production 
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of antibodies, B cells, helper T cells, suppressor T cells, and/or cytotoxic T cells, 
directed specifically to an epitope or epitopes of the polypeptide fragment. 

A transactivator is a polypeptide that affects in trans the expression of a 
coding region, preferably a coding region integrated in the genomic DNA of a 
5 cell. Such coding regions are referred to herein as "transactivated coding 
regions." The cells containing transactivated coding regions are described in 
detail herein below. Transactivators useful in the present invention include those 
that can interact with a regulatory region, preferably an operator sequence, that is 
operably linked to a transactivated coding region. As used herein, the term 

10 "transactivator" includes polypeptides that interact with an operator sequence and 
either prevent transcription from initiating at, activate transcription initiation from, 
or stabilize a transcript from, a transactivated coding region operably linked to the 
operator sequence. Examples of useful transactivators include the HIV tat 
polypeptide (see, for example, the polypeptides 

15 MEPVDPRLEPWKHPGSQPKTACTNCYCKKCCFHCQVCFITKALGISYGRK 
KRRQRRRAHQNSQTHQASLSKQPTSQPRGDPTGPKE (SEQ ID NO:5) which 
is encoded by nucleotides 5377 to 5591 and 7925 to 7970 of Genbank accession 
number AF033819), and 

MEPVDPRLEPWKHPGSQPKTACTNCYCKKCCFHCQVCFITKALG1SYGRK 
20 KRRQRRRPPQGSQTHQVSLSKQPTSQSRGDPTGPKE (SEQ ID NO: 10). The 
HIV tat polypeptide interacts with the HIV long terminal repeat (LTR). Other 
useful transactivators include human T cell leukemia virus tax polypeptide (which 
binds to the operator sequence tax response element, Fujisawa et al., J. Virol. , 65, 
4525-4528 (1991)), and transactivating polypeptides encoded by spumaviruses in 
25 the region between env and the LTR, such as the bel-1 polypeptide in the case of 
human foamy virus (which binds to the U3 domain of these viruses, Rethwilm et 
al., Proc. Natl. Acad ScL USA, 88, 941-945 (1991)). Alternatively, a post- 
transcriptional transactivator, such as HIV rev, can be used. HIV rev binds to a 
234 nucleotide RNA sequence in the env gene (the rev-response element, or RRE) 
30 of HIV (Hadzopolou-Cladaras et al., 7. Virol, 63, 1265-1274 (1989)). 

Other transactivators that can be used are those having similarity with the 
amino acid sequence of SEQ ID NO:5 or SEQ ID NO: 10. The similarity is 
generally determined as described herein above. A candidate amino acid 
sequence that is being compared to an amino acid sequence present in SEQ ID 
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NO:5 or SEQ ID NO: 10 can be isoJated from a virus, or can be produced using 
recombinant techniques, or chemically or enzymatically synthesized. Preferably, 
two amino acid sequences are compared using the Blastp program of the BLAST 
2 search algorithm, as described herein above. Preferably, a transactivator 
5 includes an amino acid sequence having a structural similarity with SEQ ID NO:5 
or SEQ ID NO: 10, of at least about 90 %, at least about 94 %, at Jeast about 96 %, 
at least about 97 %, at least about 98 %, or at least about 99 % identity. Typically, 
an amino acid sequence having a structural similarity with SEQ ID NO: 5 or SEQ 
ID NO: 10 has tat activity. Whether such a polypeptide has activity can be 

10 evaluated by determining if the amino acid sequence can interact with an HIV 
LTR, preferably alter transcription from a coding sequence operably linked to an 
HIV LTR. Useful HIV LTRs are described herein. 

Active analogs or active fragments of a transactivator can be used in the 
invention. An active analog or active fragment of a transactivator is one that is 

15 able to interact with an operator sequence and either prevent transcription from 
initiating at, activate transcription initiation from, or stabilize a transcript from, a 
transactivated coding region operably linked to the operator sequence. 

Active analogs of a transactivator include polypeptides having 
conservative amino acid substitutions that do not eliminate the ability to interact 

20 with an operator and alter transcription. Substitutes for an amino acid may be 
selected from other members of the class to which the amino acid belongs. For 
example, nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan, and tyrosine. Polar neutral amino acids 
include glycine, serine, threonine, cysteine, tyrosine, aspartate, and glutamate. 

25 The positively charged (basic) amino acids include arginine, lysine, and histidine. 
The negatively charged (acidic) amino acids include aspartic acid and glutamic 
acid. Examples of preferred conservative substitutions include Lys for Arg and 
vice versa to maintain a positive charge; Glu for Asp and vice versa to maintain a 
negative charge; Ser for Thr so that a free -OH is maintained; and Gin for Asn to 

30 maintain a free NH 2 . 

Active fragments of a transactivator include a portion of the transactivator 
containing deletions or additions of about 1, about 2, about 3, about 4, or at least 
about 5 contiguous or noncontiguous amino acids such that the resulting 
transactivator will aJter expression of an operably linked transactivated coding 
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region. A preferred example of an active fragment of the HIV tat polypeptide 
includes amino acids amino acids 1-48 of SEQ ID NO: 5, or amino acids 1-48 of 
SEQ ID NO: 10. 

In those aspects of the invention where the second coding region encodes a 
5 fusion polypeptide, the fusion polypeptide can further include amino acids 

corresponding to a cu-active proteinase. When the fusion polypeptide is a fusion 
between a transactivator and a marker, preferably the fusion polypeptide also 
includes amino acids corresponding to a as-active proteinase. Preferably the 
amino acids corresponding to a m-active proteinase are present between the 

10 amino acids corresponding to the transactivator and the marker. A ci.v-active 
proteinase in this position allows the amino acids corresponding to the 
transactivator and the marker to be physically separate from each other in the cell 
within which the replication competent polynucleotide is present. Examples of 
m-active proteinases that are useful in the present invention include the ris-active 

15 2 A proteinase of foot-and-mouth disease (FMDV) virus (see, for example, US 
Patent 5,846,767 (Halpin et al.) and US Patent 5,912,167 (Palmenberg et al.)), 
ubiquitin (see, for example, Tauz et al., Virology, 197, 74-85 (1993)), and the NS3 
recognition site GADTEDVVCCSMS Y (SEQ ID NO:6) (see, for example, Lai et 
al., J. Virol, 74, 6339-6347 (2000)). 

20 Active analogs and active fragments of m-active proteinases can also be 

used. Active analogs of a ds-acting proteinase include polypeptides having 
conservative amino acid substitutions that do not eliminate the ability of the 
proteinase to catalyze cleavage. Active fragments of a as-active proteinase 
include a portion of the c/>active proteinase containing deletions or additions of 

25 one or more contiguous or noncontiguous amino acids such that the resulting cis- 
active proteinase will catalyze the cleavage of the proteinase. 

In some aspects of the invention, the second coding region may further 
include an operably linked regulatory region. Preferably, a regulatory region 
located 5' of the operably linked coding region provides for the translation of the 

30 coding region. 

A preferred regulatory region located 5' of an operably linked second 
coding region is an internal ribosome entry site (IRES). An IRES allows a 
ribosome access to mRN A without a requirement for cap recognition and 
subsequent scanning to the initiator AUG (Pelletier, et al., Nature, 334, 320-325 
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(1988)). An IRES is located upstream of the translation initiation codon, e.g., 
ATG or AUG, of the coding sequence to which the IRES is operably linked. The 
distance between the IRES and the initiation codon is dependent on the type of 
IRES used, and is known to the art. For instance, poliovirus IRES initiates a 
5 ribosome transiocation/scanning process to a downstream AUG codon. For other 
IRES elements, the initiator codon is generally located at the 3' end of the IRES 
sequence. Examples of an IRES that can be used in the invention include a viral 
IRES, preferably a picornaviral IRES or a flaviviral IRES. Examples of 
poliovirus IRES elements include, for instance, poliovirus IRES, 

10 encephalomyocarditis virus IRES, or hepatitis A virus IRES. Examples of 
preferred flaviviral IRES elements include hepatitis C virus IRES, GB virus B 
IRES, or a pestivirus IRES, including but not limited to bovine viral diarrhea virus 
IRES or classical swine fever virus IRES. Other IRES elements with similar 
secondary and tertiary structure and translation initiation activity can either be 

15 generated by mutation of these viral sequences, by cloning of analogous 

sequences from other viruses (including picornaviruses), or prepared by enzymatic 
synthesis techniques. 

The size of the second coding region is not critical to the invention. It is 
expected there is no lower limit on the size of the second coding region, and that 

20 there is an upper limit on the size of the second coding region. This upper limit 
can be easily determined by a person skilled in the art, as second coding region 
that are greater than this upper limit adversely affect replication of a replication 
competent polynucleotide. The second coding region is typically at least about 10 
nucleotides, at least about 20 nucleotides, at least about 30 nucleotides, or at least 

25 about 40 nucleotides. 

A replication competent polynucleotide may also include a nucleotide 
sequence having cis-acting ribozyme activity. Such a ribozyme is typically 
present at the 3' end of the 3'NTR of a replication competent polynucleotide, and 
generates a precise 3' terminal end of the replication competent polynucleotide 

30 when it is an RNA molecule by cleaving the junction between the replication 

competent polynucleotide and the ribozyme. This can be advantageous when the 
replication competent polynucleotide is to be used for a transient transfection. 
Since the ribozyme catalyzes its own removal from the RNA molecule, this type 
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of ribozyme is present only when a replication competent polynucleotide is a 
DNA molecule. 

The replication competent polynucleotide of the invention can be present 
in a vector. When a replication competent polynucleotide is present in a vector 
5 the polynucleotide is DNA, including the 5* non-translated RNA and the 3' non- 
translated RNA, and, if present, the second coding sequence. Methods for 
cloning and/or inserting hepatitis C virus sequences into a vector are known to the 
art (see, e.g., Yanagi et al., Proc. Natl Acad. Sci., USA y 94, 8738-8743 (1997); 
and Rice et al., (U.S. Patent 6,127,1 16)). Such constructs are often referred to as 

10 molecularly cloned laboratory strains, and an HCV that is inserted into a vector is 
often referred to as a cDNA clone of the HCV. If the RNA encoded by the HCV 
is able to replicate in vivo, the HCV present in the vector is referred to as an 
infectious cDNA clone. A vector is a replicating polynucleotide, such as a 
plasmid, phage, cosmid, or artificial chromosome to which another polynucleotide 

15 may be attached so as to bring about the replication of the attached 

polynucleotide. A vector can provide for further cloning (amplification of the 
polynucleotide), i.e., a cloning vector, or for expression of the polypeptide 
encoded by the coding region, i.e., an expression vector. The term vector 
includes, but is not limited to, plasmid vectors, viral vectors, cosmid vectors, or 

20 artificial chromosome vectors. Preferably the vector is a plasmid. Preferably the 
vector is able to replicate in a prokaryotic host cell, for instance Escherichia coli. 
Preferably, the vector can integrate in the genomic DNA of a eukaryotic cell. 

An expression vector optionally includes regulatory sequences operably 
linked to the replication competent polynucleotide such that it is transcribed to 

25 produce RNA molecules. These RNA molecules can be used, for instance, for 
introducing a replication competent polynucleoitde into a cell that is in an animal 
or growing in culture. The terms "introduce" and "introducing" refer to providing 
a replication competent polynucleotide to a cell under conditions that the 
polynucleotide is taken up by the cell in such a way that it can then replicate. The 

30 replication competent polynucleotide can be present in a virus particle, or can be a 
nucleic acid molecule, for instance, RNA. The invention is not limited by the use 
of any particular promoter, and a wide variety are known. Promoters act as 
regulatory signals that bind RNA polymerase in a cell to initiate transcription of a 
downstream (3' direction) HCV. The promoter used in the invention can be a 
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constitutive or an inducible promoter. A preferred promoter for the production of 
replication competent polynucleotide as an RNA molecule is a T7 promoter. 

The present invention includes methods for identifying a replication 
competent polynucleotide, including detecting and/or selecting for cells 
5 containing a replication competent polynucleotide. Typically, the cells used in 
this aspect of the invention are primate or human cells growing in culture. Useful 
cultured cells will support the replication of the polynucleotides of the present 
invention, and include primary human or chimpanzee hepatocytes, peripheral 
mononuclear cells, cultured human lymphoid cell lines (for instance lines 

10 expressing B-cell and T-cell markers such as Bjab and Molt-4 cells), and 

continuous cell lines derived from such cells, including HPBMalO-2 and Daudi 
(Shimizu et al., J. Gen. Virol, 79, 1383-1386 (1998), and MT-2 (Kato et al., 
Biochem. Biophys. Res. Commun., 206, 863-869 (1995)). Other useful cells 
include those derived from a human hepatoma cells, for instance, Huh-7 (see, for 

15 instance, Lohmann et al. (Science, 285, 1 10-113 (1999)), Huh-7.5 (see, for 

instance, Blight et a., J. Virol, 76, 13001-13014 (2002), and Blight et al., J. Virol, 
77, 3183-3190 (2003)), HepG2 and IMY-N9 (Date et al., J. Biol Chem., 279, 
22371-22376 (2004)), and PH5CH8 (Ikeda et ah, Virus Res., 56, 157-167 (1998)). 
In general, useful cells include those that support replication of HCV RNA, 

20 including, for instance, replication of the HCV encoded by pCV-H77C, 
replication of the HCV encoded by pHCV-N as modified by Beard et al. 
(Hepatol, 30, 316-324 (1999)), or replication of such an HCV modified to contain 
one or more adaptive mutations. 

In some aspects of the invention, the cultured cell includes a 

25 polynucleotide that includes a coding region, the expression of which is controlled 
by a transactivator. Such a coding region is referred to herein as a transact) vated 
coding region. A transact ivated coding region encodes a marker, such as a 
detectable marker, for example, secretory alkaline phosphatase (SEAP), an 
example of which is encoded by nucleotides 748-2239 of SEQ ID NO:7 (see Fig. 

30 9). Typically, a cultured cell that includes a polynucleotide having a 

transactivated coding region is used in conjunction with a replication competent 
polynucleotide of the persent invention that includes a coding region encoding a 
transactivator. 
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The polynucleotide that includes the transactivated coding region can be 
present integrated into the genomic DNA of the cell, or present as part of a vector 
that is not integrated. Methods of modifying a cell to contain an integrated DNA 
are known to the art (see, for instance, Lemon et al., U.S. Published Application 
5 US 2003 0125541, and Yi et al., Virol., 302, 197-210 (2002)). 

Operably linked to the transactivated coding region is an operator 
sequence. The interaction of a transactivator with an operator sequence can alter 
transcription of the operably linked transactivated coding region. In those aspects 
of the invention where a transactivator increases transcription, there is typically 

10 low transcription, or, essentially no transcription, of the transactivated coding 
region in the absence of a transactivator. An operator sequence can be present 
upstream (5) or downstream (3") of a transactivated coding region. An operator 
sequence can be a promoter, or can be a nucleotide sequence that is present in 
addition to a promoter. 

15 In some aspects of the invention, the operator sequence that is operably 

linked to a transactivated coding sequence is an HIV long terminal repeat (LTR). 
An example of an HIV LTR is depicted at nucleotides 1-719 of SEQ ID NO:7. 
Also included in the present invention are operator sequences having similarity to 
nucleotides 1-719 of SEQ ID NO:7. The similarity between two nucleotides 

20 sequences may be determined by aligning the residues of the two polynucleotides 
(i.e., the nucleotide sequence of the candidate operator sequence and the 
nucleotide sequence of nucleotides 1-719 of SEQ ID NO:7) to optimize the 
number of identical nucleotides along the lengths of their sequences; gaps in either 
or both sequences are permitted in making the alignment in order to optimize the 

25 number of shared nucleotides, although the nucleotides in each sequence must 
nonetheless remain in their proper order. A candidate operator sequence can be 
isolated from a cell, or can be produced using recombinant techniques, or 
chemically or enzymatically synthesized. Preferably, two nucleotide sequences 
are compared using the Blastn program of the BLAST 2 search algorithm, as 

30 described by Tatusova, et al. (FEMS Microbiol Lett 1999, 174:247-250), and 
available at http://www.ncbi.nlm.nih.gov/gorf/bl2.html. Preferably, the default 
values for all BLAST 2 search parameters are used, including reward for match = 
1, penalty for mismatch = -2, open gap penalty = 5, extension gap penalty = 2, gap 
x_dropoff = 50, expect = 10, wordsize =11, and filter on. In the comparison of 
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two nucleotide sequences using the BLAST search algorithm, structural similarity 
is referred to as "identities/ 1 Preferably, an operator sequence includes a 
nucleotide sequence having a structural similarity with the nucleotides 1-719 of 
SEQ ID NO:7 of at least about 90 %, at least about 95 %, or at least about 99 % 

5 identity. Typically, an operator sequence having structural similarity with the 
nucleotides 1-719 of SEQ ID NO:7 has transcriptional activity. Whether such an 
operator sequence has transcriptional activity can be determined by evaluating the 
ability of the operator sequence to alter transcription of an operably linked coding 
sequence in response to the presence of a polypeptide having tat activity, 

10 preferably, a polypeptide including the amino acids of SEQ ID NO:5 or SEQ ID 
NO: 10. 

A selecting agent may be used to inhibit the replication of cultured cells 
that support the replication of polynucleotides of the present invention. Examples 
of selecting agents include antibiotics, including kanamycin, ampicillin, 

15 chloramphenicol, tetracycline, neomycin, and formulations of phleomycin Dl. A 
selecting agent can act to prevent replication of a cell, or kill a cell, while the 
agent is present and the cell does not express a molecule that provides resistance 
to the selecting agent. Typically, the molecule providing resistance to a selecting 
agent is expressed in the cell by a replication competent polynucleotide of the 

20 present invention. Alternatively, the molecule providing resistance to a selecting 
agent is expressed by the cell but the expression of the molecule is controlled by a 
replication competent polynucleotide of the present invention that is present in the 
cell. The concentration of the selecting agent is typically chosen such that a cell 
does not replicate if it does not contain a molecule providing resistance to a 

25 selecting agent. The appropriate concentration of a selecting agent varies 

depending on the particular selecting agent, and can be easily determined by one 
having ordinary skill in the art using known techniques. 

When a polynucleotide is introduced into a cell that is growing in culture, 
the polynucleotide can be introduced using techniques known to the art. Such 

30 techniques include, for instance, liposome and non-liposome mediated 

transfection. Non-liposome mediated transfection methods include, for instance, 
electroporation. 

In some aspects of the invention, when a replication competent 
polynucleotide is identified using cultured cells, its ability to replicate may be 
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verified by introducing the replication competent polynucleotide into a cell 
present in an animal, preferably a chimpanzee. When the cell is present in the 
body of an animal, the replication competent polynucleotide can be introduced by, 
for instance, subcutaneous, intramuscular, intraperitoneal, intravenous, or 
5 percutaneous intrahepatic administration, preferably by percutaneous intrahepatic 
administration. Methods for determining whether a replication competent 
polynucleotide is able to replicate in a chimpanzee are known to the art (see, for 
example, Yanagi et al., Proc. Natl Acad. Set. USA, 94, 8738-8743 (1997)). In 
general, the demonstration of infectivity is based on the appearance of the virus in 

10 the circulation of the chimpanzee over the days and weeks following the 

intrahepatic injection of the replication competent polynucleotide. The presence of 
the virus can be confirmed by reverse transcription-polymerase chain reaction 
(RT-PCR) detection of the viral RNA, by inoculation of a second chimpanzee 
with transfer of the hepatitis C virus infection as indicated by the appearance of 

15 liver disease and seroconversion to hepatitis C virus in ELISA tests, or possibly by 
the immunologic detection of components of the hepatitis C virus (e.g., the core 
protein) in the circulation of the inoculated animal. It should be noted that 
seroconversion by itself is generally not a useful indicator of infection in an 
animal injected with a viral RNA produced using a molecularly cloned laboratory 

20 strain, as this RNA may have immunizing properties and be capable of inducing 
HCV-specific antibodies to proteins translated from an input RNA that is non- 
replicating. Similarly, the absence of seroconversion does not exclude the 
possibility of viral replication and infection of a chimpanzee with HCV. 

Whether a polynucleotide is replication competent can be determined 

25 using methods known to the art, including methods that use nucleic acid 

amplification to detect the result of increased levels of replication. For instance, 
transient transfection of a cell with a replication competent polynucleotide permits 
measurement of the production of additional polynucleotides. Methods for 
transient transfection of a cell with a replication competent polynucleotide and for 

30 assay of subsequent replication are known to the art. In some aspects of the 
invention, another method for detecting a replication competent polynucleotide 
includes measuring the production of viral particles by a cell. The measurement 
of viral particles can be accomplished by passage of supernatant from media 
containing a cell culture that may contain a replication competent polynucleotide, 
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and using the supernatant to infect a second cell. Detection of the polynucleotide 
or viral particles in the second cell indicates the initial cell contains a replication 
competent polynucleotide. The production of infectious virus particles by a cell 
can also be measured using antibody that specifically binds to an HCV viral 

5 particle. As used herein, an antibody that can "specifically bind" an HCV viral 
particle is an antibody that interacts only with the epitope of the antigen (e.g., the 
viral particle or a polypeptide that makes up the particle) that induced the 
synthesis of the antibody, or interacts with a structurally related epitope. 
"Epitope" refers to the site on an antigen to which specific B cells and/or T cells 

10 respond so that antibody is produced. An epitope could include about 3 amino 
acids in a spatial conformation which is unique to the epitope. Generally an 
epitope includes at least about 5 such amino acids, and more usually, consists of at 
least about 8- 10 such amino acids. Antibodies to HCV viral particles can be 
produced as described herein. 

15 In another aspect, identifying a replication competent polynucleotide 

includes incubating a cultured cell that includes a polynucleotide of the present 
invention. In those aspects of the invention where the replication competent 
polynucleotide includes a second coding region encoding a detectable marker, 
cells containing the replication competent polynucleotide can be identified by 

20 observing individual cells that contain the detectable marker. Alternatively, if the 
detectable marker is secreted by the cell, the presence of the marker in the 
medium in which the cell is incubated can be detected. Methods for observing the 
presence or absence of a detectable marker in a cell or in liquid media are known 
to the art. 

25 Another aspect of the invention provides for the positive selection of cells 

that include a replication competent polynucleotide. In this aspect of the 
invention, a replication competent polynucleotide typically includes a second 
coding sequence encoding a selectable marker, and the cell which includes the 
replication competent polynucleotide is incubated in the presence of a selecting 

30 agent. Those cells that can replicate in the presence of the selecting agent contain 
a polynucleotide that is replication competent. The cells that can replicate are 
detected by allowing resistant cells to grow in the presence of the selecting agent, 
and observing, for instance, the presence of colonies and/or the expression of a 
marker, such as SEAP. 
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In some aspects, the method may further include isolating virus particles 
from the cells that contain a replication competent polynucleotide and exposing a 
second cell to the isolated virus particle under conditions such that the virus 
particle is introduced to the cell. After providing time for expression of the 
5 selectable marker, the second cell is then incubated with the selecting agent. The 
presence of a cell that replicates indicates the replication competent 
polynucleotide produces infectious virus particles. 

In another aspect, the invention provides a method for detecting a 
replication competent polynucleotide. The method includes incubating a cell that 

10 contains a replication competent polynucleotide of the present invention. The 
polynucleotide may include a second coding region encoding a selectable or 
detectable marker. Optionally, the polynucleotide may include a transactivator 
that interacts with the operator sequence present in the cell . In this aspect, the 
cell may include a transactivated coding region and an operator sequence operably 

15 linked to the transactivated coding region. The method further includes detecting 
the presence of increased amounts of the replication competent polynucleotide, or 
the presence or absence of the marker encoded by the second coding sequence or 
the transactivated coding region present in the cell. The presence of increased 
amounts of the replication competent polynucleotide or the marker indicates the 

20 cell includes a replication competent polynucleotide. 

The methods described above for identifying a replication competent 
polynucleotide can also be used for identifying a variant replication competent 
polynucleotide, i.e., a replication competent polynucleotide that is derived from a 
replication competent polynucleotide of the present invention. A variant 

25 replication competent polynucleotide may have a faster replication rate than the 
parent or input polynucleotide. The method takes advantage of the inherently 
high mutation rate of RNA replication. It is expected that during continued 
culture of a replication competent polynucleotide in cultured cells, the 
polynucleotide of the present invention may mutate, and some mutations will 

30 result in polynucleotides with greater replication rates. The method includes 
identifying a cell that has greater expression of a polypeptide encoded by a 
replication competent polynucleotide. A polynucleotide of the present invention 
that replicates at a faster rate will result in more of the polynucleotide in the cell, 
or will result in more of the polypeptide(s) that is encoded by the second coding 
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region present in the polynucleotide. For instance, when a replication competent 
polynucleotide encodes a selectable marker, a cell containing a variant 
polynucleotide having a greater replication rate will be resistant to higher levels of 
an appropriate selecting agent. When a polynucleotide encodes a transactivator, a 
5 cell containing a variant polynucleotide having a greater replication rate than the 
parent or input polynucleotide will express higher amounts of the transactivated 
coding region that is present in the cell. 

A cDNA molecule of a variant replication competent polynucleotide can 
be cloned using methods known to the art (see, for instance, Yanagi et al., Proc. 

10 Natl. Acad ScL, USA, 94, 8738-8743 (1997)). The nucleotide sequence of the 
cloned cDNA can be determined using methods known to the art, and compared 
with that of the input RNA. This allows identification of mutations that have 
occurred in association with passage of the replication competent polynucleotide 
in cell culture. For example, using methods known to the art, including longrange 

15 RT-PCR, extended portions of a variant replication competent polynucleotide 

genome can be obtained. Multiple clones could be obtained from each segment of 
the genome, and the dominant sequence present in the culture determined. 
Mutations that are identified by this approach can then be reintroduced into the 
background of the cDNA encoding the parent or input polynucleotide. 

20 The present invention also provides methods for identifying a compound 

that inhibits replication of a replication competent polynucleotide. The method 
includes contacting a cell containing a replication competent polynucleotide with 
a compound and incubating the cell under conditions that permit replication of the 
replication competent polynucleotide in the absence of the compound. After a 

25 period of time sufficient to allow replication of the polynucleotide, the replication 
competent polynucleotide is detected. A decrease in the presence of replication 
competent polynucleotide in the cell contacted with the compound relative to the 
presence of replication competent polynucleotide in a cell not contacted by the 
compound indicates the compound inhibits replication of the polynucleotide. A 

30 compound that inhibits replication of such a polynucleotide includes compounds 
that completely prevent replication, as well as compounds that decrease 
replication. Preferably, a compound inhibits replication of a replication competent 
polynucleotide by at least about 50%, more preferably at least about 75%, most 
preferably at least about 95%. 



34 



WO 2005/053516 



PCT/US2004/040120 



The compounds added to a cell can be a wide range of molecules and is 
not a limiting aspect of the invention. Compounds include, for instance, a 
polyketide, a non-ribosomal peptide, a polypeptide, a polynucleotide (for instance 
an antisense oligonucleotide or ribozyme), other organic molecules, or a 
5 combination thereof. The sources for compounds to be screened can include, for 
example, chemical compound libraries, fermentation media of Streptomycetes, 
other bacteria and fungi, and extracts of eukaryotic or prokaryotic cells. When the 
compound is added to the cell is also not a limiting aspect of the invention. For 
instance, the compound can be added to a cell that contains a replication 

10 competent polynucleotide; Alternatively, the compound can be added to a cell 
before or at the same time that the replication competent polynucleotide is 
introduced to the cell. 

Typically, the ability of a compound to inhibit replication of a replication 
competent polynucleotide is measured using methods described herein. For 

15 instance, methods that use nucleic acid amplification to detect the amount of a 
replication competent polynucleotide^ a cell can be used. Alternatively, methods 
that detect or select for a marker encoded by a replication competent 
polynucleotide or encoded by a cell containing a replication competent 
polynucleotide can be used. 

20 In some aspects of the invention, the replication competent polynucleotide 

of the invention can be used to produce viral particles. Preferably, the viral 
particles are infectious. For instance, a cell that includes a replication competent 
polynucleotide can be incubated under conditions that allow the polynucleotide to 
replicate, and the viral particles that are produced can be isolated using methods 

25 routine and known to the art. The viral particles can be used as a source of virus 
particles for various assays, including evaluating methods for inactivating 
particles, excluding particles from serum, identifying a neutralizing compound, 
and as an antigen for use in detecting anti-HCV antibodies in an animal. An 
example of using a viral particle as an antigen includes use as a positive-control in 

30 assays that test for the presence of anti-HCV antibodies. 

For instance, the activity of compounds that neutralize or inactivate the 
particles can be evaluated by measuring the ability of the molecule to prevent the 
particles from infecting cells growing in culture or in cells in an animal. 
Inactivating compounds include detergents and solvents that solubilize the 



35 



WO 2005/053516 



PCT/US2004/040120 



envelope of a viral particle. Inactivating compounds are often used in the 
production of blood products and cell-free blood products. Examples of 
compounds that can be neutralizing include a polyketide, a non-ribosomal peptide, 
a polypeptide (for instance, an antibody), a polynucleotide (for instance, an 
5 antisense oligonucleotide or ribozyme), or other organic molecules. Preferably, a 
neutralizing compound is an antibody, including polyclonal and monoclonal 
antibodies, as well as variations thereof including, for instance, single chain 
antibodies and Fab fragments. 

Viral particles produced by replication competent polynucleotide of the 
0 invention can be used to produce antibodies. Laboratory methods for producing 
polyclonal and monoclonal antibodies are known in the art (see, for instance, 
Harlow E. et al. Antibodies: A laboratory manual Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor (1988) and Ausubel, R.M., ed. Current Protocols in 
Molecular Biology (1994)), and include, for instance, immunizing an animal with 
5 a virus particle. Antibodies produced using the viral particles of the invention can 
be used to detect the presence of viral particles in biological samples. For 
instance, the presence of viral particles in blood products and cell-free blood 
products can be determined using the antibodies. 

The present invention further includes methods of treating an animal 
including administering neutralizing antibodies. The antibodies can be used to 
prevent infection (prophylactically) or to treat infection (therapeutically), and 
optionally can be used in conjunction with other molecules used to prevent or treat 
infection. The neutralizing antibodies can be mixed with pharmaceutically 
acceptable excipients or carriers. Suitable excipients include but are not limited to 
water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In 
addition, if desired, neutralizing antibodies and pharmaceutically acceptable 
excipients or carriers may contain minor amounts of auxiliary substances such as 
wetting or emulsifying agents, pH buffering agents, and/or adjuvants which 
enhance the effectiveness of the neutralizing antibodies. Such additional 
formulations and modes of administration as are known in the art may also be 
used. 

The virus particles produced by replication competent polynucleotide of 
the invention can be used as a source of viral antigen to measure the presence and 
amount of antibody present in an animal. Assays are available that measure the 
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presence in an animal of antibody directed to HCV, and include, for instance, 
ELISA assays and recombinant immunoblot assay. These types of assays can be 
used to detect whether an animal has been exposed to HCV, and/or whether the 
animal may have an active HCV infection. However, these assays do not use 
5 virus particles, but rather individual or multiple viral polypeptides expressed from 
recombinant cDNA that are not in the form of virus particles. Hence they are 
generally unable to detect potentially important antibodies directed against surface 
epitopes of the envelope polypeptides, nor are they typically measures of 
functionally important viral neutralizing antibodies. Such antibodies are generally 

10 detected with the use of infectious virus particles, such as those that are produced 
in this system. The use of infectious viral particles as antigen in assays that detect 
the presence of specific antibodies by virtue of their ability to block the infection 
of cells with HCV viral particles, or that possibly bind to whole virus particles in 
an ELISA assay or radioimmunoassay, will allow the detection of functionally 

15 important viral neutralizing antibodies. 

The present invention also provides a kit for identifying a compound that 
inhibits replication of a replication competent polynucleotide. The kit includes a 
replication competent polynucleotide as described herein, and a cell that contains 
a polynucleotide including a transactivated coding sequence encoding a detectable 

20 marker and an operator sequence operably linked to the transactivated coding 
sequence in a suitable packaging material. Optionally, other reagents such as 
buffers and solutions needed to practice the invention are also included. 
Instructions for use of the packaged materials are also typically included. 

As used herein, the phrase "packaging material " refers to one or more 

25 physical structures used to house the contents of the kit. The packaging material 
is constructed by well known methods, preferably to provide a sterile, 
contaminant-free environment. The packaging material may include a label which 
indicates that the replication competent polynucleoitde can be used for identifying 
a compound that inhibits replication of such a polynucleotide. In addition, the 

30 packaging material may contain instructions indicating how the materials within 
the kit are employed. As used herein, the term "package" refers to a solid matrix 
or material such as glass, plastic, and the like, capable of holding within fixed 
limits the replication competent virus and the vertebrate cell. 
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The present invention is illustrated by the following examples. It is to be 
understood that the particular examples, materials, amounts, and procedures are to 
be interpreted broadly in accordance with the scope and spirit of the invention as 
set forth herein. 

5 

EXAMPLES 

Materials and Methods 

Cells. Huh7 cells were grown in Dulbecco's modified Eagle's medium 

10 (Gibco BRL, Carlsbad, CA) supplemented with 10% fetal calf serum, penicillin 
and streptomycin. En5-3 is a clonal cell line derived from Huh7 cells by stable 
transformation with the plasmid pLTR-SEAP (Yi et ah, Virology, 304,197-210 
(2002)). These cells were cultured in Dulbecco's modified Eagle's medium (Gibco 
BRL) supplemented with 10% fetal calf serum, 2 ^xg/ml blasticidin (Invitrogen), 

15 penicillin and streptomycin. Cell lines were passaged once or twice per week. 

G418 at a concentration of 250 ^ig/ml was used to select colonies from En5-3 cells 
transfected with replicon RNAs containing la sequences. 

Plasmids. The plasmid pBpp-Htat2ANeo was constructed by replacing the 
BsrG\-Xba\ fragment of pBpp-Ntat2ANeo/SI (identical to Ntat2ANeo/SI as 

20 described by Yi et al. ( Yi et ah, Virology, 304,197-210 (2002)) with the 

analogous segment of pH77c (GenBank AF01 1751) (Yanagi et al., Proc Natl 
Acad Sci USA, 94, 8738-43 (1997)) engineered to contain a RyrGI site at the 
corresponding location by Quick-Change (Stratagene, La Jolla, CA) mutagenesis. 
This fragment swap results in the NS3-NS5B sequence in pBpp-Htat2ANeo being 

25 identical to that of pH77c, with the exception of the RNA encoding the N-terminal 
75 amino acid residues of NS3 that retains the genotype lb Conl sequence. Since 
Bpp-Ntat2ANeo/SI was originally engineered to contain the genotype la 5' 
nontranslated RNA (5'NTR) sequence (Yi et aL, Virology, 304,197-210 (2002)), 
the resulting pBpp-Htat2ANeo construct possesses both a genotype la 5'NTR and 

30 la 3NTR sequence. Overlapping PCR was used to fuse an anti-genomic hepatitis 
delta ribozyme sequence directly to the 3' end of the genotype la 3'NTR, in order 
to generate a self-cleaving 3' sequence with the exact 3* terminal nucleotide of 
HCV ( Perrotta and Been, Nucleic Acids Res, 24,1314-21 (1996)). Derivatives of 
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pBpp-Htat2ANeo containing the adaptive mutations K1691R or S2204] were 
created by Quick-Change (Stratagene) mutagenesis. 

To construct pBpp-H34A-Niat2ANeo/SI, an EcoRl restriction site was 
created in pBpp-Ntat2ANeo/SI near the 3' end of the NS4A coding region by 
5 Quick-Change mutagenesis. After digestion of the resulting plasmid with BsrG\ 
and EcoRl, the excised HCV segment was replaced with the equivalent sequence 
from pH77c which had been amplified by PCR using primers pairs containing 
terminal BsrGl and EcoRl sites, respectively. To construct the plasmid Hpp- 
H34A-Ntat2ANeo, DNA fragments representing the encephalomyocarditis virus 

10 (EMCV) internal ribosome entry site (IRES) sequence and the genotype la H77c 
NS3 protein-coding sequence were fused by overlapping PCR. The resulting 
fragment was digested with Kpnl at a site located within the EMCV IRES and 
BsrGl at the site created within the modified pH77c NS3 region (see above), then 
inserted in place of the corresponding fragment in pBpp-H34A-Ntat2ANeo/SI. 

15 The adaptive mutations, Q1067R or Gl 188R, were introduced into pHpp-H34A- 
Ntat2ANeo/SI in a similar fashion, using cDNA fragments prepared by RT-PCR 
of template RNAs isolated from independent G418-resistant replicon cell lines 
selected after transfection of En5-3 cells with Hpp-H34A-Ntat2ANeo RNA. 
pHtat2ANeo/SI was constructed by replacing the BsrGl-Xbal fragment of pHpp- 

20 H34A-Ntat2ANeo/Sl with that of pBpp-Htat2ANeo/SI. A similar strategy was 
used to construct pHtat2ANeo/QR/SI, pHtat2ANeo/KR/SI, and 
pHtat2ANeo/QR/KR/SI. Quick-Change (Stratagene) mutagenesis was used to 
introduce the P1496L, F2080V and K2040R mutations into replicon constructs 
derived from pHtat2ANeo/SI. 

25 Modified pH77c plasmids containing adaptive mutations were created by 

replacing the BsrG\-Xba\ fragment with the corresponding fragment from the 
pHtat2ANeo plasmid derivative containing the indicated mutation, except for the 
Q1067R mutation which was introduced by Quick-Change (Stratagene) 
mutagenesis. Each mutation was confirmed by sequence analysis. For use as 

30 controls, replication-incompetent subgenomic and genome-length genotype la 
constructs (Htat2ANeo/QR/VI/KR/KR5A/Sl/AAG and 

H77/QR/VI/KR/KR5A/SI/AAG) were created by replacing residues 2737-2739 of 
NS5B ('GDD') with 'AAG' using a similar strategy. Each mutation was confirmed 
by sequence analysis. 
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RNA transcription and transfection. RNA was synthesized with T7 
MEGAScript reagents (Ambion, Austin, TX), after linearizing pJasmids with 
Xbal. Following treatment with RNase-free DNase to remove template DNA and 
precipitation of the RNA with lithium chloride, the RNA was transfected into 
5 Huh7 cells or En5-3 cells by electroporation. Briefly, 5 \ig RNA was mixed with 2 
x 10 6 cells suspended in 500 |il phosphate buffered saline, in a cuvette with a gap 
width of 0.2 cm (Bio-Rad). Electroporation was with two pulses of current 
delivered by the Gene Pulser 11 electroporation device (Bio-Rad), set at 1.5 kV, 25 
^F, and maximum resistance. For transient replication assays, no G418 was added 

10 to the media. Transfected cells were transferred to two wells of a 6-well tissue 
culture plate, and culture medium removed completely every 24 hrs and saved at 
4°C for subsequent SEAP assay. The cells were washed twice with PBS prior to 
re-feeding with fresh culture medium. Since the culture medium was replaced 
every 24 hours in these transient assays, the SEAP activity measured in these 

15 fluids reflected the daily production of SEAP by the cells. Cells were split 5 days 
after transfection. Samples of media were stored at 4°C until assayed for SEAP 
activity at the conclusion of the experiment. 

Alkaline phosphatase assay. SEAP activity was measured in 10 jil 
aliquots of transfected cell supernatant culture fluids using the Phospha-Light 

20 Chemiluminescent Reporter Assay (Applied Biosystems/Tropix, Foster City, CA) 
with the manufacturer's suggested protocol reduced in scale. The luminescent 
signal was read using a TD-20/20 Luminometer (Turner Designs, Inc., Sunnyvale, 
CA). 

Sequence analysis of cDNAfrom replicating HCV RNAs. HCV RNA was 
25 extracted from cells, converted to cDNA and amplified by PCR as described 

previously (Yi et al., J Virol,77, 57-68 (2003)). First-strand cDNA synthesis was 
carried out with Superscript II reverse transcriptase (Gibco-BRL); pfu-Turbo 
DNA polymerase (Stratagene) was used for PCR amplification of the DNA. The 
amplified DNAs were subjected to direct sequencing using an ABI 9600 
30 automatic DNA sequencer. 

In vitro translation. In vitro transcribed RNA, prepared as described 
above, was used to program in vitro translation reactions in rabbit reticulocyte 
lysate (Promega, Madison, WI). Approximately 1 |ag RNA, 2 [il of [ 35 S]- 
methionine (1,000 Ci/mmol at 10 mCi/ml), and 1 |il of an amino acid mixture 
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lacking methionine were included in each 50 \i\ reaction mixture. Translation was 
carried out at 30° C for 90 minutes. Translation products were separated by SDS- 
PAGE followed by autoradiography or Phosphor! mager (Molecular Dynamics) 
analysis. 

5 Indirect immunofluorescence. Cells were grown on chamber slides until 

70-80% confluent, washed 3 times with PBS, and fixed in methanol/acetone (1:1 
V/V) for 10 min at room temperature. A 1 :20 dilution of a primary, murine 
monoclonal antibody to core or NS5A (Maine Biotechnology Services, Portland, 
ME) was prepared in PBS containing 3% bovine serum albumin, and incubated 

10 with the fixed cells for 1 hour at room temperature. Following additional washes 
with PBS, specific antibody binding was detected with a goat anti-mouse IgG 
FITC-conju gated secondary antibody (Sigma, St. Louis, Missouri) diluted 1 :70. 
Cells were washed with PBS, counterstained with DAPI, and mounted in 
Vectashield mounting medium (Vector Laboratories, Burlingame, CA) prior to 

15 examination by a Zeiss AxioPlan2 Fluorescence microscope. 

Northern analysis for HCV RNA, Replicon-bearing cells were seeded into 
10 cm dishes at a density of 5xl0 5 cells/dish, and harvested the RNA 4 days later. 
Total cellular RNA was extracted with Trizol reagent (Gibco-BRL) and quantified 
by spectrophotometry at 260 nm. Thirty |ig of the total RNA extracted from each 

20 well was loaded onto a denaturing agarose-formaldehyde gel, subjected to 
electrophoresis and transferred to positively-charged Hybond-N+ nylon 
membranes (Amersham-Pharmacia Biotec) using reagents provided with the 
NorthernMax Kit (Ambion). RNAs were immobilized on the membranes by UV- 
crosslinking. The membrane was hybridized with a mixture of [ 32 P]-labeled 

25 antisense riboprobe complementary to the 3'-end of the HCV NS5B sequence 
(nucleotides 8990-9275) derived from pH77C or pHCV-N, and the hybridized 
probe was detected by exposure to X-ray film. 

Results 

30 Transient replication of la replicon containing chimeric NS3-coding 

sequence. In contrast to genotype lb HCV, several previous reports suggest that it 
is difficult to generate subgenomic genotype la replicons that are capable of 
efficient replication in Huh7 cells (Blight et al„ Science, 290: 1972-4 (2000), Guo 
et al., J Virol, 75, 8516-23 (2001), Ikeda et aL, J Virol, 76, 2997-3006 (2002), 
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Lanford et a!., J Virol, 77,1092-104 (2003)). Similar results were encountered 
with a dicistronic SEAP reporter replicon constructed from the H77c infectious 
molecular clone (Yanagi et al., Proc Natl Acad Sci USA, 94, 8738-43 (1997)) that 
encoded both the HIV tat protein and neomycin phosphotransferase in the 
5 upstream cistron. The organization of this latter replicon, Htat2ANeo/SI (Fig. 1), 
was similar to that of the efficiently replicating, genotype lb Bpp-Ntat2ANeo/SI 
replicon (Fig. 1), referred to previously simply as "Ntat2ANeo/SF (Yi et al., 
Virology, 304,197-210 (2002)). Most of the HCV polyprotein-coding sequence in 
Bpp-Ntat2ANeo/SI was derived from the genotype lb HCV-N strain of HCV 

10 (Beard et al., Hepatol., 30, 316-24 (1999)), but the "Bpp" prefix used here and 
throughout this communication refers to the presence of 225 nucleotides (nts) of 
sequence that are derived from the Conl strain of HCV at the extreme 5' end of 
the polyprotein coding region ("pp" indicates the 5' proximal protease-coding 
region, Fig. 1). In contrast, all of the HCV sequence in Htat2ANeo/SI (Fig. 1) is 

15 derived from the genotype la H77c virus, including both the 5' NTR and 3' NTR 
sequences. Unlike Bpp-Ntat2ANeo/SI RNA, Htat2ANeo/SI RNA did not 
transduce the selection of G418-resistant colonies, nor induce secretion of SEAP 
above that observed with a replication-incompetent NS5B-deletion mutant 
(AGDD) when transfected into En5-3 cells (stably transformed Huh7 cells that 

20 express SEAP under control of the HIV long terminal repeat promoter) (Yi et al., 
Virology, 304,197-210 (2002)). in a transient replication assay. This was the case 
even though the replicon was engineered to contain the genotype lb adaptive 
mutation, S2204I, within NS5A (Fig. 1). The absence of apparent replication of 
Htat2ANeo/SI RNA was striking given the fact that it was derived from a well- 

25 documented infectious molecular clone of the H77c strain of HCV (Yanagi et al., 
Proc Natl Acad Sci USA, 94, 8738-43 (1997)). 

Recent reports suggest that the EMCV IRES-driven translation of the 
second cistron in dicistronic, subgenomic RNAs such as those shown in Fig. 1 
may be reduced when the translated RNA sequence is derived from genotype la 

30 virus, rather than genotype lb (Gu et al., J Virol, 77, 5352-9 (2003), Guo et al., J 
Virol, 75, 8516-23 (2001), Lanford et al,, J Virol, 77,1092-104 (2003)). However, 
even when translation of the second cistron is rendered more efficient by replacing 
the 5' 225 nts of the genotype la NS3 sequence with related sequence from the 
Conl genotype lb virus, replication typically has not been observed when the 



WO 2005/053516 



PCT/US2004/040120 



remainder of the replicon sequence is derived from a genotype la virus (Guo et 
ah, J Virol, 75, 8516-23 (2001), Lanford et ah, J Virol, 77,1092-104 (2003)). 
However, Gu et al. (Gu et a!., J Virol, 77, 5352-9 (2003)) recently described the 
successful selection of a replication competent, chimeric replicon in which the 5' 
5 225 nts of the NS3 coding sequence was derived from genotype lb virus, and the 
remainder of the second cistron from genotype la HCV (construction of chimeric 
replicons being simplified by a unique BsrG\ site within the genotype lb Con 1 
vims sequence, 225 nts downstream from the 5' end of the NS3 region). This 
replicon also contained 5'NTR sequence derived from genotype lb virus, and had 

10 a single base change within the genotype la 3'NTR sequence. The results of Gu et 
al. (Gu et al., J Virol, 77, 5352-9 (2003)) suggest that the inclusion of the Conl 
sequence at the 5' end of the NS3 region may in some way facilitate replication of 
the la RNA. This hypothesis is strengthened by observations made with genotype 
lb replicons derived from HCV-N. Those described previously, including Bpp- 

15 Ntat2ANeo/SI RNA, were constructed by ligation of HCV-N sequence to a Conl 
replicon at the BsrGi site (Guo et al., J Virol, 75, 8516-23 (2001), lkeda et al., J 
Virol, 76, 2997-3006 (2002), Yi et al., Virology, 304,197-210 (2002)), and thus 
they contain 5' proximal NS3 sequence (proximal protease sequence or 'pp\ Fig. 
1) derived from the Conl virus. Although this chimeric Conl/HCV-N RNA 

20 replicates significantly more efficiently than the originally-described Con 1 

replicons, the replacement of the 5' proximal NS3 sequence in Bpp-Ntat2ANeo/SI 
with sequence from HCV-N (resulting in Npp-Ntat2ANeo/SI) virtually ablated its 
replication phenotype in transient transfection assays, although it remained 
possible to select G418-resistant colonies at a low frequency following 

25 transfection. 

To formally assess the ability of the 5' proximal genotype lb NS3 
sequence to enhance genotype la RNA replication, the 5' 225 nts of NS3 coding 
region in Htat2ANeo/SI were replaced with the Conl sequence, generating Bpp- 
Htat2ANeo/SI (Fig. 1). The construct was also modified by replacing the Xbal 

30 restriction site at the 3' end of the HCV sequence with the hepatitis delta virus 
ribozyme sequence (Perrotta and Been, Nucleic Acids Res, 24,1314-21 (1996)). 
We have shown previously that the presence of the 4 extraneous nts at the 3' end 
of the replicon RNA that results from run-off transcription of Xfcal-digested 
plasmid DNA reduces the replication competence of genotype lb RNAs by 2-3 
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fold (Yi and Lemon, Rna, 9, 331-45 (2003)). The inclusion of the ribozyme 
resulted in self-cleaving RNA transcripts capable of generating the exact 3' 
terminal HCV RNA sequence. Nonetheless, this modified Bpp-Htat2ANeo/SI 
RNA still remained incapable of inducing the expression of SEAP in transfected 
5 EN5-3 cells beyond that observed following transfection of the AGDD RNA. 

Transfection resulted only in an initial burst in SEAP expression due to translation 
of the input replicon RNA, without the sustained SEAP expression that is 
indicative of RNA replication (Fig. 2). However, the Bpp- H tat2 ANeo/S T RNA 
was capable of transducing the selection of G418-resistant cell colonies 

10 supporting replication of the RNA over a period of 3-4 weeks following 
transfection of the cells. 

The sequence of replicon RNAs extracted from two independent G418- 
resistant cell clones selected following the transfection of En5-3 cells with Bpp- 
Htat2ANeo RNA was analyzed. The presence of a single Lys to Arg mutation 

15 located within the NS4A region, at residue 1691 (K1691R) of the polyprotein in 
both cell clones was determined. This residue is located just beyond the 3' limits 
of the NS4A cofactor peptide sequence which participates in forming a 
noncovalent complex with NS3 and enhances its protease activity (Wright- 
Minogue et ah, J Hepatol, 32, 497-504 (2000), Yao et al., Structure Fold Des, 7, 

20 1353-63 (1999)). To determine whether the K 1691 R mutation facilitated 

replication of the chimeric genotype lb/la RNA in En5-3 cells, this mutation was 
introduced into the parental Bpp-Htat2ANeo/SI construct, thereby creating Bpp- 
Htat2ANeo/KR/SI (see Table 2 for a list of all adaptive mutations identified in 
these studies, as well as the symbols used to indicate their presence in constructs). 

25 As shown in Fig. 2, this single mutation significantly enhanced the replication 

capacity of the RNA, allowing replication to be detected by a sustained increase in 
SEAP expression following transient transfection of EN5-3 cells in the absence of 
G418 (Fig. 2). Since the level of SEAP production has been shown to correlate 
closely with intracellular replicon RNA abundance in this reporter system (Yi et 

30 ah, Virology, 304,197-210 (2002), Yi et ah, J Virol,77, 57-68 (2003)) we 

conclude that K1691R is an adaptive mutation. Interestingly, this mutation has 
been shown previously to confer an enhanced replication phenotype on Conl 
replicons (Lohmann et al., J Virol, 77, 3007-19 (2003)), whereas the sequence of 
HCV-N is naturally Arg at this position (Beard et al., Hepatol., 30, 316-24 
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(1999)). Our results stand in contrast to those reported by Gu et al. (Gu et al., J 
Virol, 77, 5352-9 (2003)), who identified several mutations within the NS3, 
NS5A, and NS5B sequences of chimeric genotype lb- la RNAs. None of these 
mutations appeared to enhance the ability of the chimeric RNA to replicate or 
5 transduce colony selection. 

The 5 ' 225 nts of the genotype la NS3 sequence down modulate replicon 
amplification. The results described above, as well as those of Gu et aJ. (Gu et al., 
J Virol, 77, 5352-9 (2003)) suggest that first 225 nts of the genotype la NS3 
sequence have a negative impact on the replication of subgenomic HCV replicons. 

10 This could occur by down modulation of EMCV IRES-directed translation of the 
nonstructural proteins (Guo et al., J Virol, 75, 8516-23 (2001)), or by directly 
influencing replication itself, possibly by influencing an NS3-related function. To 
address this issue, the identification of additional adaptive mutations capable of 
compensating for the presence of the 5' proximal genotype la protease sequence 

15 was sought. Thus additional chimeric replicons containing the entire genotype 1 a 
NS3/4A sequence within the background of Bpp-Ntat2 ANeo/SI (Hpp-H34A- 
Ntat2ANeo/SI, Fig. 3A) were constructed. Also constructed was a variant of this 
construct in which the first 225 nts of the NS3/4A sequence was replaced with 
Conl sequence (Bpp-H34A-Ntat2ANeo/SI, Fig. 3 A). In both chimeric RNAs, the 

20 sequence extending from NS4B to the 3'NTR was derived entirely from the 

genotype lb HCV-N strain. While the replicon containing the entire genotype la 
NS3/4A sequence (Hpp-H34A-NtatNeo/SI) did not show evidence of replication 
in a transient transfection assay, the variant containing the first 225 nts of the 
Conl sequence (Bpp-H34A-NtatNeo/SI) replicated as well as the reference Bpp- 

25 Ntat2ANeo/SI replicon (Fig. 3B). This result confirms that the 5' 225 nucleotides 
of the genotype la NS3 sequence have a negative effect on RNA replication in 
En5-3 cells, and also indicates that the downstream genotype la NS3/4A sequence 
functions well in this context. 

Interestingly, despite the lack of detectable RNA replication in the 

30 transient assay, selection of stable G418-resistant cell clones following 

transfection of Hpp-H34A-Ntat2ANeo/SI RNA was possible. Sequencing of 
replicon RNAs derived from two independent cell clones revealed only a single 
potentially adaptive mutation in each: Q1067R and Gl 188R, both of which are 
located within RNA encoding the NS3 protease (Fig. 3 A). The Q1067R mutation 
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is of particularly interest, since it is within the 5' 225 nucleotides of the NS3 
region. When introduced into Hpp-H34A-Ntat2ANeo/SI, both the Q1067R and (to 
a lesser extent) the G l 188R mutations enhanced replication of the RNA to a level 
that was detectable in the transient assay (Fig. 3B), indicating that both are 
5 adaptive mutations and capable of compensating, in part, for the presence of the 
genotype la protease sequence. However, neither of these mutations, when 
introduced into a replicon containing only genotype la sequence (Htat2ANeo/SI), 
was able to enhance replication to the point where it was evident in the transient 
assay (Htat2ANeo/QR/SI, Fig. 4). 

10 Transient replication of a genotype la replicon in normal Huh7 cells. To 

determine whether the K 1691 R and Q1067R mutations might work cooperatively 
to confer a transient replication phenotype on the genotype la replicon RNA, both 
were introduced into Htat2ANeo and assessed the ability of the modified RNA to 
replicate in transfected En5-3 cells. Surprisingly, the combination of the K1691R 

15 and Q1067R mutations (in addition to the S2204I mutation in NS5A) conferred a 
relatively robust replication phenotype on the genotype la RNA, such that 
replication was easily detectable in the transient transfection assay using the 
SEAP reporter system (Htat2ANeo/QR/KR/Sl, Fig. 4B). Using an approach 
similar to that taken in the preceding experiments, an additional adaptive mutation 

20 (F2080V) within the NS5A-coding region (F2080V) was subsequently identified, 
when cells transfected with Htat2ANeo/QR/KR/SI RNA were subjected to G418 
selection pressure. This mutation resulted in slightly greater replication efficiency 
when introduced into the genotype la replicon containing K 1691 R and Q1067R in 
addition to S2204I (Htat2ANeo/QR/KR/FV/Sl, Fig. 4B). However, F2080V had 

25 relatively little effect when added to replicons containing only K1691R or 

Q1067R (in addition to S2204I) (Fig. 4B). Minimally increased secretion of SEAP 
above the AGDD background was observed during the first 5 days after 
transfection with Htat2ANeo/KR/FV/SI, but this was no longer apparent after 6 
days. The replication phenotype of Htat2ANeo/QR/FV/SI was indistinguishable 

30 from that of the replication incompetent AGDD mutant in this assay (Fig. 4B). 
These results are summarized in Fig. 4C. 

To facilitate a comparison of these results with those reported previously 
by Blight et al. (Blight et al., J Virol, 77, 3181-90 (2003)), the adaptive P1496L 
mutation identified by this group within the helicase domain of NS3 following 
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transfection of a genotype la replicon was introduced into the highly permissive 
Huh7 subline, Huh-7.5. Consistent with the previous report, a la replicon bearing 
this mutation PJ496L demonstrated only minimal evidence of replication in the 
transient assay (which utilizes En5-3 cells that are comparable to normal Huh7 
5 cells in terms of their permissiveness for HCV RNA replication) 

(Htat2ANeo/PL/SI, Fig. 4B). The addition of the NS5A mutation, F2080V, failed 
to noticeably enhance the replication capacity of this RNA (Htat2ANeo/PL/FV/SI, 
Fig. 4B). SEAP expression induced by genotype la replicons containing both 
Q1067R and K1691R was approximately 10-fold that induced by replicons 

!0 containing P1496L. Since SEAP production from En5-3 cells correlates closely 
with the intracellular abundance of replicon RNA (Yi et al., Virology, 304,197- 
210 (2002)), these results suggest that the protease domain mutations make a 
greater contribution to replication competence of the genotype la replicon. 
Adoptive mutations within NS3 do not affect EMCV IRES-driven 

15 translation of the second cistron. As mentioned above, previous reports indicate 
that the EMCV-driven translation of the second cistron is reduced in genotype la 
replicons in comparison to replicons containing the genotype lb Conl sequence 
(Gu et al., J Virol, 77, 5352-9 (2003), Guo et al., J Virol, 75, 8516-23 (2001), 
Lanford et al., J Virol, 77,1092-104 (2003)). Although the mechanism is 

20 uncertain, the effect appears to be due to the genotype la sequence encoding the 
amino terminus of NS3. Since the adaptive Q1067R mutation is located within 
this region, we asked whether it or other mutations that enhance la replicon 
amplification do so by improving EMCV IRES-driven translation of the HCV 
nonstructural proteins. To test this hypothesis, in vitro translation reactions were 

25 programmed with genotype lb and la replicon RNAs containing various adaptive 
mutations, and compared the production of proteins encoded by the second cistron 
with neomycin phosphotransferase produced from the first cistron. As shown in 
Fig. 5, the synthesis of NS3 was modestly reduced with replicons containing 
genotype la H77c sequence in the 5' proximal protease region (compare NS3 

30 abundance in lanes 4-8 with that in other lanes). However, it was not increased by 
any of the adaptive mutations, including Q1067R. This result indicates that the 
difficulty of establishing replication competent la replicons is more likely due to 
the intrinsic property of the la sequence, than to an incompatibility of the HCV 
and EMCV sequences in this region leading to reduced activity of the EMCV 
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IRES. Nonetheless, the reduced level of translation of the genotype la 
nonstructural proteins that is evident in Fig. 5 may contribute to the poor 
replication phenotype of these RNAs. 

An additional adaptive NS5A mutation further augments replication 
5 competence. Although the F2080V mutation in NS5A provided only a slight 

additional replication advantage to subgenomic genotype la RNAs containing the 
Q1067R, K1691R and S22041 mutations (Fig. 4), additional mutations were 
subsequently identified concurrently near the C-terminus of NS3 (VI 6551) and 
within NS5A (K2040R) in RNAs replicating within a G418-resistant cell line 

10 selected following transfection with the subgenomic Htat2ANeo/QR/KR/SI 
replicon. As shown in Fig. 6, both of these mutations enhanced the replication 
capacity of genotype la RNA. Addition of the VI 6551 mutation resulted in a 
modest enhancement of Htat2ANeo/QR/KR/SI replication, leading to a replication 
phenotype slightly better that observed with the addition of the F2080V mutation. 

15 In contrast, the addition of the K2040R mutation in NS5A resulted in a dramatic 
increase in replication competence, rendering the replication phenotype of the 
genotype la RNA equivalent to that of the standard genotype lb HCV-N replicon 
used in these studies, Bpp-Ntat2ANeo/SI (Fig. 6B). A genotype 1 a replicon 
containing both of these adaptive mutations in addition to those identified earlier 

20 replicated with slightly greater efficiency than this reference genotype lb RNA in 
the transient assay (Fig. 6B, Htat2ANeo/QR/VI/KR/KR5A/SI). These results were 
confirmed in independent experiments. 

Robust replication of genome-length genotype J a RNA with adaptive 
mutations. Encouraged by the above results, we assessed the in vitro replication 

25 competence of genomelength, genotype la H77c RNA engineered to contain the 
adaptive mutations described above. As with the dicistronic, subgenomic RNAs, 
we placed the hepatitis delta ribozyme sequence at the 3* end of the cloned 
infectious cDNA sequence in pH77c in order to generate RNA transcripts 
containing an exact HCV 3' terminus. As these genomic RNAs encoded no 

30 selectable marker or reporter protein product, their replication was assessed in 
transfected Huh7 and En5-3 cells by northern blot analysis in comparison with 
related subgenomic RNAs. Subgenomic and genome-length replication- 
incompetent H77 mutant RNAs, in which the GDD motif had been replaced with 
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AAG, served as negative controls for this experiment. For En5-3 cells transfected 
with the subgenomic RNAs, we also determined levels of SEAP expression. 

As expected, the unmodified H77c RNA showed no evidence of 
replication, even though it has been shown previously to be infectious in 
5 chimpanzees when inoculated into liver (Fig. 7, compare lane 7 with the 
replication defective la genomic RNA in lane 1 1). The introduction of the 
Q1067R (NS3) mutation, alone or in combination with S2204I (NS5A), was 
insufficient to confer a detectable level of replication in Huh7 cells. However, 
when all three mutations were introduced (Q1067R, K1691R and S2204I), the 

10 H77c RNA acquired a relatively efficient replication phenotype with readily 
detectable amplification of the RNA in northern blots of cell lysates prepared 4 
days after transfection of either Huh7 or En5-3 cells (Fig. 7, lane 8). Replication 
of the genome-length RNA was slightly increased by the further addition of the 
F2080V (NS5A) mutation (Fig. 7, lane 9). However, consistent with the data 

15 presented in Fig. 6, the inclusion of both the VI 6551 mutation in NS3 and the 
K2040R mutation conferred a substantially more robust replication phenotype on 
genome-length H77c, when present in combination with other adaptive mutations 
in NS3, NS4A and NS5A (H77c/QR/VI/KR/KR5A/SI, Fig. 7, compares lanelO 
and 1 1). This experiment thus confirmed the adaptive effects of these mutations. 

20 Northern blotting indicated that the replication capacity of genome-length 

genotype la RNAs containing adaptive mutations was significantly greater than 
the comparable subgenomic, dicistronic genotype la replicons, for which the 
RNA signal 4 days after transfection was low and near the limits of detection in 
northern blots (Fig. 7, compare lanes 3 to 6 with lanes 8 to 1 1). These findings are 

25 consistent with those reported previously by Blight et al. (J. Virol, 11, 3181-3190 
(2003)), and indicate that the inclusion of heterologous sequences in the 
dicistronic replicons impairs RNA replication competence. Subgenomic replicon 
RNA was detected unambiguously only in cells transfected with 
Htat2ANeo/QR/VI/KR/KR5A/SI, the RNA that generated the highest level of 

30 SEAP expression (Fig. 7, compare lane 5 and 6). 

As a further measure of the replication competence of these modified 
genome-length H77c RNAs, we also examined transfected En5-3 cells for the 
presence of core or NS5A proteins using an indirect immunofluorescence method. 
Introduction of both the K 1691 R (NS4A) and S2204I mutations resulted in 
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detectable antigen expression 4 days after transfection, albeit only in a very low 
percentage of cells (less than 0.01%). However, strong expression of both the core 
and NS5A proteins was observed in approximately 30% of En5-3 cells 4 days 
after transfection of RNA containing all four adaptive mutations. Increased 
5 replication efficiency of genotype la RNAs correlated with a greater proportion of 
cells supporting the replication of HCV RNA, evidenced by the presence of viral 
antigen. 

Discussion 

10 Subgenomic, dicistronic, selectable HCV RNA replicons derived from 

genotype lb viruses replicate efficiently in cultured cells (Blight et al, Science, 
290:1972-1974 (2000), Guo et ah, J. Virol., 75:8516-8523 (2001), Ikeda et al., J. 
Virol., 76:2997-3006 (2002), Krieger et al., J. Virol., 75:4614-4624 (2001), 
Lohmann et al., J. Virol., 75:1437-1449 (2001), and Lohmann et al., Science 

15 285:1 10-113 (1999)). These novel RNAs have facilitated the study of HCV RNA 
replication and substantially accelerated antiviral drug discovery efforts. The 
Huh7 cell line, derived from a human hepatoma, appears to be uiquely permissive 
and supportive of the replication of these HCV RNAs, although recent studies 
suggest that other types of cells may also be permissive for HCV RNA replication 

20 (Zhu et al., J. Virol., 77:9204-9210 (2003)). However, despite the success of 
genotype lb replicons, it has been difficult to generate RNAs that replicate 
efficiently in any cell type from other genotypes of HCV, including genotype la, 
(Blight et al, Science, 290:1972-1974 (2000), Guo et al., J. Virol., 75:8516-8523 
(2001), Ikeda et al., J. Virol., 76:2997-3006 (2002), and Lanford et al., J. Virol., 

25 77: 1092-104 (2003)). This surprising observation indicates that significant 

biological differences exist between genotype la and lb viruses, despite the fact 
that the nucleotide sequences of genotype la viruses are relatively closely related 
to those of genotype lb (-90-93% identity). This biological difference raises the 
likelihood that antiviral agents that are found to be active against the genotype lb 

30 virus may have significantly lesser activity against genotype la viruses. 

Considering these observations and the relatively high genetic variability that 
exists between different HCV genotypes, the development of cell culture systems 
supporting replication of viral RNAs from other genotypes will be important for 
validating in vitro efficacy of candidate antiviral agents across a range of 
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geneticaUy distinct HCV genotypes, as weJl as developing a better overall 
understanding of these viruses. 

Genotype la viruses are the most prevalent types of HCV in the United 
States, and like genotype 1 b virus they are relatively refractory to treatment with 
5 interferon (Fried et ah, N Engl J Med, 347, 975-82 (2002), McHutchison and 
Fried, Clin Liver Dis, 7, 149-61 (2003)). Thus far, a detectable level of genotype 
la RNA replication has been reported only in specially isolated, highly permissive 
Huh7 human hepatoma cell sublines (e.g., Huh-7.5 cells) generated by eliminating 
the replication of genotype lb RNA replicons from established replicon cell lines 

10 using interferon-a in vitro (Blight et al., J Virol, 77, 3181-90 (2003), Grobler et 
al., J Biol Chem, 278,16741-6 (2003)). These previously described genotype la 
RNAs possess cell culture-adaptive mutations that enhance their replication in 
these special cells, including those selected during the isolation of antibiotic- 
resistant cell lines containing these la replicons (Blight et al., J Virol, 77, 3181-90 

15 (2003), Grobler et al., J Biol Chem, 278,16741-6 (2003)). However, the published 
reports suggest that these previously described genotype 1 a RNAs do not replicate 
to a detectable level in standard Huh7 cells, and that their capacity for replication 
in cultured cells is thus limited. In contrast, genotype la HCV RNAs are reported 
here that replicate in a highly efficient manner in normal Huh7 cells. 

20 Our results suggest that the highly efficient replication of genotype la 

RNAs requires at least three adaptive mutations located within the NS3, NS4A 
and NS5A proteins. It is evident that these mutations are mutually reinforcing in 
their ability to enhance the replication of the genotype la RNAs, even though they 
were identified individually under different circumstances. It was found that the 

25 introduction of the S2204I mutation in NS5A, which is known to promote the 
replication of genotype lb virus RNAs in Huh7 cells (Blight et al., Science, 
290:1972-4 (2000)), was not sufficient for subgenomic replicons composed 
entirely of the genotype la sequence to initiate replication in Huh7 cells. 
However, it made possible the selection of G418-resistant cell colonies following 

30 transfection of a chimeric replicon RNA, in which sequence from the infectious 
molecular clone of the genotype la H77c virus encoded all of the nonstructural 
proteins other than the N-terminal 75 amino acid residues of NS3 which were 
derived from the genotype lb Conl sequence (Fig. 1 , Bpp-Htat2 ANeo/SI). The 
HCV RNAs replicating in these cells contained a single mutation within the 
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NS4A-coding region (K169IR) that enhanced the replication capacity of the 
original chimeric replicon RNA (Fig. 2). These results suggest that a restriction to 
the replication of genotype la virus in Huh7 cells may reside within the serine 
protease domain of NS3, since substitution of the N-terminal domain of the 
5 genotype la protease with that from the Conl genotype lb virus allowed the 
initiation of replication and the selection of G418-resistant cells. A similar 
conclusion can be drawn from the results reported by Gu et. al. (Gu et al., J Virol, 
77, 5352-9 (2003)). Thus, it is interesting that the adaptive mutation K1691R 
resides within NS4A very close to the surface of the NS3/4A protease complex 

10 that it helps to form (Fig. 8). 

In an effort to better understand this restriction, a second chimeric replicon 
containing the complete genotype la NS34A sequence within the background of a 
genotype lb replicon was constructed. This RNA (Hpp-H34A-Ntat2ANeo/SI) did 
not undergo detectable replication in the transient transfection system utilized in 

15 these studies (Fig. 3). However, it was capable of transducing the selection of 
G4l8-resistant cell colonies following transfection and antibiotic selection. 
Analysis of the sequence of the HCV RNAs replicating within these cells 
identified a second, cell culture-adaptive mutation within the N-terminal region of 
the NS3 protease (Q1067R), providing further evidence that a primary restriction 

20 to replication of genotype la virus resides within this domain. Yet additional 
evidence for this comes from the replication phenotype of the Bpp-H34A- 
Ntat2ANeo/SI replicon, which also contains all of the genotype la NS3/4A 
sequence except for the N-terminal 75 amino acid resides, and which 
demonstrated a robust replication phenotype in the transient transfection assay. 

25 Thus there appears to be no restriction to replication deriving from inclusion of 
the genotype la NS3 helicase domain, nor for that matter any part of the protease 
domain except for its N-terminus. 

Further work demonstrated that the K 1691 R and Q1067R mutations 
worked cooperatively: neither by itself was capable of conferring the capacity for 

30 efficient replication on a replicon composed entirely of genotype la sequence, but 
a combination of the two (in addition to the genotype lb S2204I adaptive 
mutation) resulted in RNA replication that could be readily detected in the 
transient transfection assay (Fig. 4). That these mutations should act cooperatively 
in their effects on replication, as indicated by the data shown in Fig. 4, is 
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consistent with their location in the polyprotein, since the NS4A protease cofactor 
domain interacts primarily with residues within the N-terminal domain of the NS3 
protease (Wright-Mi nogue et a]., J Hepatol, 32, 497-504 (2000), Yao et ah, 
Structure Fold Des, 7, 1353-63 (1999)). 
5 Additional adaptive mulations were identified and verified through an 

iterative series of experiments involving RNA transfection, isolation of G418- 
resistant cells, and analysis of the sequence of efficiently replicating genotype la 
RNAs. Also demonstrated was that the S2204I mutation did indeed facilitate the 
replication of the genotype la RNA, as its removal from the efficiently replicating 

10 subgenomic RNAs substantially reduced their replication competence in the 

transient transfection assay. The genotype la adaptive mutations identified herein 
are summarized in Table 2. They can be grouped functionally into two groups: 
K2040R, F2080V, and S2204I, which are all located within NS5A (a common site 
of genotype lb adaptive mutations), and Q1067R, Gl 188R, V1655I, and K1691R, 

15 which are all located in or otherwise associated with the protease domain of NS3. 
While to some extent solvent exposed, both Gl 188R and Q1067R are close to the 
active site of the protease (Fig. 8), and would both add a significant charge 
difference to the active face of the protein. VI 6551 is particularly interesting. It is 
located near the extreme C-terminus of the NS3 protein, downstream of the 

20 helicase domain, and close to the protease active site in the crystal structure of the 
NS3/4A complex (Yao et al., Structure Fold Des, 7, 1353-63 (1999)). In the P3 
position of the NS3/4A cleavage site, VI 655 is certain to play a role in substrate 
recognition during the c/s-active cleavage of the polyprotein at the NS3/4A 
junction and it remains within the substrate-binding pocket in the crystal structure. 

25 The potential impact of the K1691R mutation, within NS4A, on the conformation 
of the protease active site is much less certain, but it is in close proximity to the 
NS4A cofactor domain, as mentioned above, and intercalation of this domain into 
the NS3 protease is well known to modulate the activity of the protease. 

Significantly, all of these NS3 and NS4A mutations are located at some 

30 distance from other genotype 1 a adaptive mutations in NS3 that have been 

described in the literature (see Fig. 8). These mutations, located at S 1222, A1226 
and P1496, are all within the helicase domain of NS3 (Blight et al., J Virol, 77, 
3181-90 (2003), Grobler et ah, J Biol Chem, 278,16741-6 (2003)). While on the 
surface of the protein, they are located on the side opposite the solvent exposed 
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surfaces containing the G1188, VI 655, and Q1067 residues (Fig. 8). Thus, it is 
possible that they facilitate genotype la RNA replication by a different 
mechanism than those mutations that cluster near the active site of the protease. At 
least the P1496L mutation identified by both Blight et al. (Blight et al., J Virol, 77, 

5 3181-90 (2003)) and Grobler et al. (Grobler et ah, J Biol Chem, 278,16741-6 

(2003)) appears to be substantially less active in conferring replication capacity on 
the genotype la H77c RNA. This was demonstrated by the lack of detectable 
replication of RNA replicons containing this mutation (Htat2ANeo/PL/Sl and 
Htat2ANeo/PL/FV/SI) in the transient transfection experiment summarized in Fig. 

10 4. 

What role could mutations near the active site of the NS3 protease play in 
promoting the replication of genotype la HCV RNA in Huh7 cells? It is unlikely 
that these mutations work by enhancing translation of the nonstructural proteins 
under control of the EMCV IRES in the context of the subgenomic replicon, since 

15 we observed no difference in translation of these proteins in vitro in reticulocyte 
lysates programmed with these RNAs (Fig. 5). More importantly, they enhance 
the replication of genomic H77c RNA lacking any heterologous sequence in Huh7 
cells (see Fig. 7). These mutations do not seem likely to promote replication by 
favorably influencing the ability of the protease to process the viral polyprotein, 

20 since the polyprotein segment expressed in the Htat2ANeo derivatives is derived 
entirely from the same H77c genome, and this replicates very efficiently in 
chimpanzee liver. However, this does remain a formal possibility that needs to be 
excluded in future studies. It is possible, instead, that these mutations promote 
interactions of the NS3/4A complex with specific cellular proteins that play a role 

25 in assembly of the viral replicase complex, or otherwise influence replication by 
disabling innate cellular antiviral defenses. 

Foy et al. (Foy et al., Science, 300, 1 145-8 (2003)) recently demonstrated 
that expression of the NS3/4A protease effectively blocked activation of interferon 
regulatory factor 3 (IRF3) in Huh7 cells infected with Sendai virus, thereby 

30 preventing the induction of synthesis of interferon-p and other antiviral cytokines. 
This immuno-evasive action of NS3 was reversed by a specific ketoamide 
inhibitor of the NS3/4A protease, and was dependent upon the protease activity of 
NS3/4A, indicating that NS3/4A is likely to cleave a cellular protein involved in 
IRF3 signaling following viral infection. While Foy et al. (Foy et al., Science, 
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300, 1 145-8 (2003)) demonstrated that both genotype ia and genotype lb 
proteases are capable of blocking IRF3 activation, it is intriguing to consider that 
the adaptive mutations within NS3/4A may promote its ability to direct such a 
cleavage, thereby enhancing replication of the virus by lessening cellular antiviral 
5 defenses. 

The second group of adaptive mutations identified within NS5A, K2040R, 
F2080V, and S2204I (Table 2), are likely to function in a fashion similar to NS5A 
adaptive mutations identified in genotype lb replicons, which include S2204I. 
Although their specific mechanism of action is not known, they may either 

10 promote the ability of NS5A to assemble a functional replicase complex in Huh7 
cells, or perhaps augment the immunomodulatory actions that have been proposed 
for this viral protein through its interactions with double-stranded RNA stimulated 
protein kinase R (PKR) (Galeet al., Clin Diagn Virol, 10,157-62 (1998)). The 
contribution of these adaptive mutations to the replication of the genotype la 

15 RNA in these studies appears to be additive to that of the NS3/4A mutations (Figs. 
3 and 6), not synergistic as shown for the combination of Q1067R and K1691R 
(Fig. 3). 

The complete disclosure of all patents, patent applications, and 
20 publications, and electronically available material (including, for instance, 

nucleotide sequence submissions in, e.g., GenBank and RefSeq, and amino acid 
sequence submissions in, e.g., SwissProt, PIR, PRF, PDB, and translations from 
annotated coding regions in GenBank and RefSeq) cited herein are incorporated 
by reference. The foregoing detailed description and examples have been given 
25 for clarity of understanding only. No unnecessary limitations are to be understood 
therefrom. The invention is not limited to the exact details shown and described, 
for variations obvious to one skilled in the art will be included within the 
invention defined by the claims. 

All headings are for the convenience of the reader and should not be used 
30 to limit the meaning of the text that follows the heading, unless so specified. 
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Sequence Listing Free Text 

SEQ ED NO: J Nucleotide sequence of Hepatitis C vims strain H77 

SEQ ID NO:2 Amino acid sequence of HCV polyprotein encoded by 

nucleotides 342 - 9377 of SEQ ID NO.l. 
5 SEQ ID NO:3 Nucleotide sequence of Hepatitis C virus strain H 

SEQ ED NO:4 Amino acid sequence of HCV polyprotein encoded by 

nucleotides 342 - 9377 of SEQ ID NO:3. 

SEQ ID NO:5 HIV tat polypeptide 

SEQ ID NO:6 NS3 recognition site 

10 SEQ ID NO:7 Nucleotide sequence of HIV SEAP, HIV long terminal 

repeat (LTR) is depicted at nucleotides 1-719, and secretory alkaline phosphatase 

is encoded by the nucleotides 748-2239. 

SEQ ID NO:8 Nucleotide sequence of a 3' NTR. 

SEQ ID NO:9 Nucleotide sequence of a 5' NTR 

1 5 SEQ ED NO: 1 0 HIV tat polypeptide 

SEQ ID NO: 1 1 genomic length hepatitis C virus, genotype 1 a 

SEQ ID NO: 1 2 HCV polyprotein encoded by the coding region present in 

SEQEDNO:ll. 

SEQ ID NO: 1 3 nucleotide sequence of Htat2ANeo 
20 SEQ ID NO: 14 HCV polyprotein encoded by the coding region present in 
SEQ ED NO: 13. 
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What is claimed is: 

1 . A replication competent polynucleotide comprising: 

a 5' non-translated region (NTR), a 3' NTR, and a first coding sequence 
present between the 5VNTR and 3'NTR and encoding a hepatitis C virus 
polyprotein, wherein the polyprotein comprises an isoleucine at about amino acid 
2204, and further comprises an adaptive mutation selected from the group of an 
arginine at about amino acid 1067, an arginine at about amino acid 1691, valine at 
about amino acid 2080, an isoleucine at about amino acid 1655, an arginine at 
about amino acid 2040, an arginine at about amino acid 1 188, and a combination 
thereof. 

2. The replication competent polynucleotide of claim 1 further comprising a 
second coding sequence. 

3. The replication competent polynucleotide of claim 2 wherein the second 
coding sequence encodes a marker. 

4. The replication competent polynucleotide of claim 2 wherein the second 
coding sequence encodes a transactivator. 

5. The replication competent polynucleotide of claim 1 wherein the 5' NTR, 
the 3'NTR, and the nucleotide sequence encoding the polyprotein are genotype 
la. 

6. The replication competent polynucleotide of claim 1 wherein the hepatitis 
C virus polyprotein is a subgenomic hepatitis C virus polyprotein. 

7. The replication competent polynucleotide of claim 1 wherein the hepatitis 
C virus polyprotein comprises cleavage products core, El, E2, P7, NS2, NS3, 
NS4A, NS4B, NS5A, and NS5B. 
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8. The replication competent polynucleotide of claim 1 further comprising a 
nucleotide sequence having cis-acting ribozyme activity, wherein the nucleotide 
sequence is located 3' of the 3' NTR. 

9. A replication competent polynucleotide comprising: 

a 5' non- translated region (NTR), a 3'NTR, and a first coding sequence 
present between the 5' NTR and 3' NTR and encoding a hepatitis C virus 
polyprotein comprising cleavage products core, El , E2, P7, NS2, NS3, NS4A, 
NS4B, NS5A, and NS5B, wherein the polyprotein comprises an isoleucine at 
about amino acid 2204, an arginine at about amino acid 1067, and an arginine at 
about amino acid 1691. 

10. The polynucleotide of claim 9 further comprising a second coding 
sequence. 

11. The polynucleotide of claim 10 wherein the second coding sequence 
encodes a marker . 

12. The polynucleotide of claim 10 wherein the second coding sequence 
encodes a transactivator. 

13. The polynucleotide of claim 9 wherein the 5 'NTR, polyprotein, and 3' 
NTR are genotype la. 

14. A method for making a replication competent polynucleotide comprising: 
providing a polynucleotide comprising a 5* NTR, 3' NTR, a first coding 

sequence present between the 5' NTR and 3' NTR and encoding a hepatitis C virus 
polyprotein, wherein the polyprotein comprises a serine at about amino acid 2204, 
a glutamine at about amino acid 1067, a lysine at about amino acid 1691, a 
phenylalanine at about amino acid 2080, a valine at about amino acid 1655, a 
lysine at about amino acid 2040, or a glycine at about amino acid 1 188 and 
wherein the 5' NTR, polyprotein, and 3' NTR are genotype la; and 

altering the coding sequence such that the polyprotein encoded thereby 
comprises an isoleucine at amino acid 2204, and at least one adaptive mutation 
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selected from the group consisting of an arginine at about amino acid 1067, an 
arginine at about amino acid 1691, a valine at about amino acid 2080, an 
isoleucine at about amino acid 1655, an arginine at about amino acid 2040, an 
arginine at about amino acid 1 188, and a combination thereof. 

15. The method of claim 14 wherein the hepatitis C virus polyprotein is a 
subgenomic hepatitis C virus polyprotein. 

16. The method of claim 14 wherein the hepatitis C virus polyprotein 
comprises cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, 
and NS5B. 

17. A replication competent polynucleotide produced by the method of claim 
14. 

18. A method for identifying a compound that inhibits replication of a 
replication competent polynucleotide, the method comprising: 

contacting a cell comprising a replication competent polynucleotide with a 
compound, the replication competent polynucleotide comprising 5'NTR, 3'NTR, 
a first coding sequence present between the 5' NTR and 3' NTR and encoding a 
hepatitis C virus polyprotein, wherein the hepatitis C virus polyprotein comprises 
an isoleucine at about amino acid 2204, and further comprises an adaptive 
mutation selected from the group of an arginine at about amino acid 1067, an 
arginine at about amino acid 1691, valine at about amino acid 2080, an isoleucine 
at about amino acid 1655, an arginine at about amino acid 2040, an arginine at 
about amino acid 1 188, and a combination thereof; 

incubating the cell under conditions wherein the replication competent 
polynucleotide replicates in the absence of the compound; and 

detecting the replication competent polynucleotide, wherein a decrease of 
the replication competent HCV polynucleotide in the cell contacted with the 
compound compared to the replication competent polynucleotide in a cell not 
contacted with the compound indicates the compound inhibits replication of the 
replication competent polynucleotide. 
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19. The method of claim 18 wherein detecting the replication competent 
polynucleotide comprises nucleic acid amplification. 

20. The method of claim 18 wherein the replication competent polynucleotide 
further comprises a second coding sequence encoding a marker, and wherein 
detecting the replication competent polynucleotide comprises identifying the 
marker. 

21. The method of claim 18 wherein the replication competent polynucleotide 
further comprises a second coding sequence encoding a transactivator, wherein 
the cell comprises a polynucleotide comprising a transactivated coding sequence 
encoding a detectable marker and an operator sequence operably linked to the 
transactivated coding sequence, wherein the transactivator interacts with the 
operator sequence and alters expression of the transactivated coding sequence, and 
wherein detecting the replication competent polynucleotide in the cell comprises 
detecting the detectable marker encoded by the transactivated coding sequence. 

22. The method of claim 18 wherein the cell is a human hepatoma cell. 

23. The method of claim 18 wherein the hepatitis C virus polyprotein is a 
subgenomic hepatitis C vims polyprotein. 

24. The method of claim 1 8 wherein the hepatitis C virus polyprotein 
comprises cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, 
and NS5B. 

25. The method of claim 18 wherein the 5'NTR, polyprotein, and 3'NTR are 
genotype la. 

26. A method for selecting a replication competent polynucleotide, the method 
comprising: 

incubating a cell in the presence of a selecting agent, wherein: 
the cell comprises a polynucleotide comprising a 5* non- 
translated region (NTR), a 3'NTR, and a first coding sequence present between 
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the 5'NTR and 3'NTR and encoding a hepatitis C virus polyprotein, and a second 
coding sequence, wherein the polyprotein comprises an isoleucine at about amino 
acid 2204, and further comprises an adaptive mutation selected from the group of 
an arginine at about amino acid 1067, an arginine at about amino acid 1691 , valine 
at about amino acid 2080, an isoleucine at about amino acid 1655, an arginine at 
about amino acid 2040, an arginine at about amino acid 1 188, and a combination 
thereof; 

the second coding sequence encodes a selectable marker 
conferring resistance to the selecting agent; and 

the selecting agent inhibits replication of a cell that does not 
express the selectable marker; and 

detecting a cell that replicates in the presence of the selecting 
agent, wherein the presence of such a cell indicates the polynucleotide is 
replication competent. 

27. The method of claim 26 wherein the selecting agent is an antibiotic. 

28. The method of claim 26 wherein the cell is a human hepatoma cell. 

29. The method of claim 26 wherein the cell is a first cell, the method further 
comprising: 

obtaining a virus particle produced by the first cell; 

exposing a second cell to the isolated virus particle and incubating 
the second cell in the presence of the selecting agent; and 

detecting a second cell that replicates in the presence of the 
selecting agent, wherein the presence of such a cell indicates the replication 
competent polynucleotide in the first cell produces an infectious virus particle. 

30. The method of claim 26 wherein the hepatitis C virus polyprotein is a 
subgenomic hepatitis C virus polyprotein. 

3 1 . The method of claim 26 wherein the hepatitis C virus polyprotein 
comprises cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, 
and NS5B. 



61 



WO 2005/053516 



PCTAJS2004/040120 



32. The method of claim 26 wherein the 5'NTR, polyprotein, and 3' NTR are 
genotype la. 

33. A method for detecting a replication competent polynucleotide, the method 

comprising: 

incubating a cell comprising a replication competent 
polynucleotide, wherein: 

the replication competent polynucleotide comprises a 5' 
non-translated region (NTR), a 3' NTR, and a first coding sequence present 
between the 5'NTR and 3' NTR and encoding a hepatitis C virus polyprotein, and 
a second coding sequence encoding a transactivator, wherein the polyprotein 
comprises an isoleucine at about amino acid 2204, and further comprises an 
adaptive mutation selected from the group of an arginine at about amino acid 
1067, an arginine at about amino acid 1691, valine at about amino acid 2080, an 
isoleucine at about amino acid 1655, an arginine at about amino acid 2040, an 
arginine at about amino acid 1 188, and a combination thereof; 

the cell comprises a transactivated coding region and an 
operator sequence operably linked to the transactivated coding region; and 

the transactivated coding region encodes a detectable 
marker, wherein the transactivator alters transcription of the transactivated coding 
region; and 

detecting the detectable marker, wherein the presence of the 
detectable marker indicates the cell comprises a replication competent 
polynucleotide. 

34. The method of claim 33 wherein the hepatitis C virus polyprotein is a 
subgenomic hepatitis C virus polyprotein. 

35. The method of claim 33 wherein the hepatitis C virus polyprotein 
comprises cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, 
and NS5B. 
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36. The method of claim 33 wherein the 5' NTR, polyprotein, and 3'NTR are 
genotype la. 

37. A kit comprising: 

a replication competent polynucleotide comprising a 5' non-translated 
region (NTR), a 3'NTR, and a first coding sequence present between the 5' NTR 
and 3'NTR and encoding a hepatitis C virus polyprotein, and a second coding 
sequence encoding a transactivator, wherein the polyprotein comprises an 
isoleucine at about amino acid 2204, and further comprises an adaptive mutation 
selected from the group of an arginine at about amino acid 1067, an arginine at 
about amino acid 1691, valine at about amino acid 2080, an isoleucine at about 
amino acid 1655, an arginine at about amino acid 2040, an arginine at about 
amino acid 1 188, and a combination thereof; and 

a cell comprising a polynucleotide comprising a transactivated coding 
sequence encoding a detectable marker and an operator sequence operably linked 
to the transactivated coding sequence, wherein the transactivator interacts with the 
operator sequence and alters expression of the transactivated coding sequence. 

38. The method of claim 37 wherein the hepatitis C virus polyprotein is a 
subgenomic hepatitis C virus polyprotein. 

39. The method of claim 37 wherein the hepatitis C virus polyprotein 
comprises cleavage products core, El, E2, P7, NS2, NS3, NS4A, NS4B, NS5A, 
and NS5B. 

40. The method of claim 37 wherein the 5' NTR, polyprotein, and 3'NTR are 
genotype la. 
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61 tcttcacgca gaaagcgtct agccatggcg ttagtatgag tgtcgtgcag cctccaggac 
121 cccccctccc gggagagcca tagtggtctg cggaaccggt gagtacaccg gaattgccag 
181 gacgaccggg tcctttcttg gataaacccg ctcaatgcct ggagatttgg gcgtgccccc 
241 gcaagactgc tagccgagta gtgttgggtc gcgaaaggcc ttgtggtact gcctgatagg 
301 gtgcttgcga gtgccccggg aggtctcgta gaccgtgcac catgagcacg aatcctaaac 
361 ctcaaagaaa aaccaaacgt aacaccaacc gtcgcccaca ggacgtcaag ttcccgggtg 
421 gcggtcagat cgttggtgga gtttacttgt tgccgcgcag gggccctaga ttgggtgtgc 
481 gcgcgacgag gaagacttcc gagcggtcgc aacctcgagg tagacgtcag cctatcccca 
541 aggcacgtcg gcccgagggc aggacctggg ctcagcccgg gtacccttgg cccctctatg 
601 gcaatgaggg ttgcgggtgg gcgggatggc tcctgtctcc ccgtggctct cggcctagct 
661 ggggccccac agacccccgg cgtaggtcgc gcaatttggg taaggtcatc gataccctta 
721 cgtgcggctt cgccgacctc atggggtaca taccgctcgt cggcgcccct cttggaggcg 
781 ctgccagggc cctggcgcat ggcgtccggg ttctggaaga cggcgtgaac tatgcaacag 
841 ggaaccttcc tggttgctct ttctctatct tccttctggc cctgctctct tgcctgactg 
901 tgcccgcttc agcctaccaa gtgcgcaatt cctcggggct ttaccatgtc accaatgatt 
961 gccctaactc gagtattgtg tacgaggcgg ccgatgccat cctgcacact ccggggtgtg 
1021 tcccttgcgt tcgcgagggt aacgcctcga ggtgttgggt ggcggtgacc cccacggtgg 
1081 ccaccaggga cggcaaactc cccacaacgc agcttcgacg tcatatcgat ctgcttgtcg 
1141 ggagcgccac cctctgctcg gccctctacg tgggggacct gtgcgggtct gtctttcttg 
1201 ttggtcaact gtttaccttc tctcccaggc gccactggac gacgcaagac tgcaattgtt 
1261 ctatctatcc cggccatata acgggtcatc gcatggcatg ggatatgatg atgaactggt 
1321 cccctacggc agcgttggtg gtagctcagc tgctccggat cccacaagcc atcatggaca 
1381 tgatcgctgg tgctcactgg ggagtcctgg cgggcatagc gtatttctcc atggtgggga 
14 41 actgggcgaa ggtcctggta gtgctgctgc tatttgccgg cgtcgacgcg gaaacccacg 
1501 tcaccggggg aaatgccggc cgcaccacgg ctgggcttgt tggtctcctt acaccaggcg 
1561 ccaagcagaa catccaactg atcaacacca acggcagttg gcacatcaat agcacggcct 
1621 tgaattgcaa tgaaagcctt aacaccggct ggttagcagg gctcttctat caacacaaat 
1681 tcaactcttc aggctgtcct gagaggttgg ccagctgccg acgccttacc gattttgccc 
1741 agggctgggg tcctatcagt tatgccaacg gaagcggcct cgacgaacgc ccctactgct 
1801 ggcactaccc tccaagacct tgtggcattg tgcccgcaaa gagcgtgtgt ggcccggtat 
1861 attgcttcac tcccagcccc gtggtggtgg gaacgaccga caggtcgggc gcgcctacct 
1921 acagctgggg tgcaaatgat acggatgtct tcgtccttaa caacaccagg ccaccgctgg 
1981 gcaattggtt cggttgtacc tggatgaact caactggatt caccaaagtg tgcggagcgc 
2041 ccccttgtgt catcggaggg gtgggcaaca acaccttgct ctgccccact gattgcttcc 
2101 gcaaacatcc ggaagccaca tactctcggt gcggctccgg tccctggatt acacccaggt 
2161 gcatggtcga ctacccgtat aggctttggc actatccttg taccat.caat tacaccatat 
2221 tcaaagtcag gatgtacgtg ggaggggtcg agcacaggct ggaagcggcc tgcaactgga 
2281 cgcggggcga acgctgtgat ctggaagaca gggacaggtc cgagctcagc ccgttgctgc 
2341 tgtccaccac acagtggcag gtccttccgt gttctttcac gaccctgcca gccttgtcca 
2401 ccggcctcat ccacctccac cagaacattg tggacgtgca gtacttgtac ggggtagggt 
2461 caagcatcgc gtcctgggcc attaagtggg agtacgtcgt tctcctgttc cttctgcttg 
2521 cagacgcgcg cgtctgctcc tgcttgtgga tgatgttact catatcccaa gcggaggcgg 
2581 ctttggagaa cctcgtaata ctcaatgcag catccctggc cgggacgcac ggtcttgtgt 
2641 ccttcctcgt gttcttctgc tttgcgtggt atctgaaggg taggtgggtg cccggagcgg 
2701 tctacgccct ctacgggatg tggcctctcc tcctgctcct gctggcgttg cctcagcggg 
2761 catacgcact ggacacggag gtggccgcgt cgtgtggcgg cgttgttctt gtcgggttaa 
2821 tggcgctgac tctgtcgcca tattacaagc gctatatcag ctggtgcatg tggtggcttc 
2881 agtattttct gaccagagta gaagcgcaac tgcacgtgtg ggttcccccc ctcaacgtcc 
2941 ggggggggcg cgatgccgtc atcttactca tgtgtgtagt acacccgacc ctggtatttg 
3001 acatcaccaa actactcctg gccatcttcg gacccctttg gattcttcaa gccagtttgc 
3061 ttaaagtccc ctacttcgtg cgcgttcaag gccttctccg gatctgcgcg ctagcgcgga 
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3121 agatagccgg aggtcattac gtgcaaatgg ccatcatcaa gttaggggcg cttactggca 
3181 cctatgtgta taaccatctc acccctcttc gagactgggc gcacaacggc ctgcgagatc 
3241 tggccgtggc tgtggaacca gtcgtcttct cccgaatgga gaccaagctc atcacgtggg 
3301 gggcagatac cgccgcgtgc ggtgacatca tcaacggctt gcccgtctct gcccgtaggg 
3361 gccaggagat actgcttggg ccagccgacg gaatggtctc caaggggtgg aggttgctgg 
3421 cgcccatcac ggcgtacgcc cagcagacga gaggcctcct agggtgtata atcaccagcc 
3481 tgactggccg ggacaaaaac caagtggagg gtgaggtcca gatcgtgtca actgctaccc 
3541 aaaccttcct ggcaacgtgc atcaatgggg tatgctggac tgtctaccac ggggccggaa 
3601 cgaggaccat cgcatcaccc aagggtcctg tcatccagat gtataccaat gtggaccaag 
3661 accttgtggg ctggcccgct cctcaaggtt cccgctcatt gacaccctgt acctgcggct 
3721 cctcggacct ttacctggtc acgaggcacg ccgatgtcat tcccgtgcgc cggcgaggtg 
3781 atagcagggg tagcctgctt tcgccccggc ccatttccta cttgaaaggc tcctcggggg 
3841 gtccgctgtt gtgccccgcg ggacacgccg tgggcctatt cagggccgcg gtgtgcaccc 
3901 gtggagtggc taaagcggtg gactttatcc ctgtggagaa cctagggaca accatgagat 
3961 ccccggtgtt cacggacaac tcctctccac cagcagtgcc ccagagcttc caggtggccc 
4 021 acctgcatgc tcccaccggc agcggtaaga gcaccaaggt cccggctgcg tacgcagccc 
4 081 agggctacaa ggtgttggtg ctcaacccct ctgttgctgc aacgctgggc tttggtgctt 
4141 acatgtccaa ggcccatggg gttgatccta atatcaggac cggggtgaga acaattacca 
4201 ctggcagccc catcacgtac tccacctacg gcaagttcct tgccgacggc gggtgctcag 
4261 gaggtgctta tgacataata atttgtgacg agtgccactc cacggatgcc acatccatct 
4 321 tgggcatcgg cactgtcctt gaccaagcag agactgcggg ggcgagactg gttgtgctcg 
4 381 ccactgctac ccctccgggc tccgtcactg tgtcccatcc taacatcgag gaggttgctc 
4 441 tgtccaccac cggagagatc cccttttacg gcaaggctat ccccctcgag gtgatcaagg 
4 501 ggggaagaca tctcatcttc tgccactcaa agaagaagtg cgacgagctc gccgcgaagc 
4561 tggtcgcatt gggcatcaat gccgtggcct actaccgcgg tcttgacgtg tctgtcatcc 
4 621 cgaccagcgg cgatgttgtc gtcgtgtcga ccgatgctct catgactggc tttaccggcg 
4 681 acttcgactc tgtgatagac tgcaacacgt gtgtcactca gacagtcgat ttcagccttg 
4741 accctacctt taccattgag acaaccacgc tcccccagga tgctgtctcc aggactcaac 
4801 gccggggcag gactggcagg gggaagccag gcatctatag atttgtggca ccgggggagc 
4861 gcccctccgg catgttcgac tcgtccgtcc tctgtgagtg ctatgacgcg ggctgtgctt 
4 921 ggtatgagct cacgcccgcc gagactacag ttaggctacg agcgtacatg aacaccccgg 
4 981 ggcttcccgt gtgccaggac catcttgaat tttgggaggg cgtctttacg ggcctcactc 
5041 atatagatgc ccacttttta tcccagacaa agcagagtgg ggagaacttt ccttacctgg 
5101 tagcgtacca agccaccgtg tgcgctaggg ctcaagcccc tcccccatcg tgggaccaga 
5161 tgtggaagtg tttgatccgc cttaaaccca ccctccatgg gccaacaccc ctgctataca 
5221 gactgggcgc tgttcagaat gaagtcaccc tgacgcaccc aatcaccaaa tacatcatga 
5281 catgcatgtc ggccgacctg gaggtcgtca cgagcacctg ggtgctcgtt ggcggcgtcc 
5341 tggctgctct ggccgcgtat tgcctgtcaa caggctgcgt ggtcatagtg ggcaggatcg 
5401 tcttgtccgg gaagccggca attatacctg acagggaggt tctctaccag gagttcgatg 
54 61 agatggaaga gtgctctcag cacttaccgt acatcgagca agggatgatg ctcgctgagc 
5521 agttcaagca gaaggccctc ggcctcctgc agaccgcgtc ccgccatgca gaggttatca 
5581 cccctgctgt ccagaccaac tggcagaaac tcgaggtctt ttgggcgaag cacatgtgga 
5641 atttcatcag tgggatacaa tacttggcgg gcctgtcaac gctgcctggt aaccccgcca 
5701 ttgcttcatt gatggctttt acagctgccg tcaccagccc actaaccact ggccaaaccc 
5761 tcctcttcaa catattgggg gggtgggtgg ctgcccagct cgccgccccc ggtgccgcta 
5821 ctgcctttgt gggtgctggc ctagctggcg ccgccatcgg cagcgttgga ctggggaagg 
5881 tcctcgtgga cattcttgca gggtatggcg cgggcgtggc gggagctctt gtagcattca 
5941 agatcatgag cggtgaggtc ccctccacgg aggacctggt caatctgctg cccgccatcc 
6001 tctcgcctgg agcccttgta gtcggtgtgg tctgcgcagc aatactgcgc cggcacgttg 
6061 gcccgggcga gggggcagtg caatggatga accggctaat agccttcgcc tcccggggga 
6121 accatgtttc ccccacgcac tacgtgccgg agagcgatgc agccgcccgc gtcactgcca 
6181 tactcagcag cctcactgta acccagctcc tgaggcgact gcatcagtgg ataagctcgg 
6241 agtgtaccac tccatgctcc ggttcctggc taagggacat ctgggactgg atatgcgagg 
6301 tgctgagcga ctttaagacc tggctgaaag ccaagctcat gccacaactg cctgggattc 
6361 cctttgtgtc ctgccagcgc gggtataggg gggtctggcg aggagacggc attatgcaca 
6421 ctcgctgcca ctgtggagct gagatcactg gacatgtcaa aaacgggacg atgaggatcg 
6481 tcggtcctag gacctgcagg aacatgtgga gtgggacgtt ccccattaac gcctacacca 
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MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRL 
GVRATRKTSERSQPRGRRQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRG 
S RPSWG PTD PRRRS RNLGKV I DTLTCG FADLMG Y I PLVGA PLGGAARALAHGVRVLE D 
GVNYATGNLPGCSFSIFLLALLSCLTVPASAYQVRNSSGLYHVTNDCPNSSIVYEAAD 
AILHTPGCV PCVREGNASRCWVAVT PTVATRDGKLPTTQLRRH I DLLVGSATLCSALY 
VGDLCGSVFLVGQLFTFSPRRHWTTQDCNCSIYPGHITGHRI^WDh4MMNWSPTAALVV 
AQLLRIPQAIMDMIAGAHWGVLAGIAYFSMVGNWAKVLVVLLLFAGVDAETHVTGGNA 
GRTTAGLVGLLTPGAKQNIQLINTNGSWHINSTALNCNESLNTGWLAGLFYQHKFNSS 
GCPERLASCRRLTDFAQGWGPISYANGSGLDERPYCWHYPPRPCGIVPAKSVCGPVYC 
FTPSPWVGTTDRSGAPTYSWGANDTDVFVLNNTRPPLGNWFGCTWMNSTGFTKVCGA 
PPCVIGGVGNNTLLCPTDCFRKHPEATYSRCGSGPWITPRCMVDYPYRLWHYPCTINY 
TIFKVRMYVGGVEHRLEAACNWTRGERCDLEDRDRSELSPLLLSTTQWQVLPCSFTTL 
PALSTGLIHLHQNIVDVQYLYGVGSSIASWAIKWEYWLLFLLLADARVCSCLWMMLL 
ISQAEAALENLVILNAASLAGTHGLVSFLVFFCFAWYLKGRWVPGAVYTUjYGMWPLLL 
LLLALPQRAYALDTEVAASCGGWLVGLMALTLSPYYKRYISWC^4WWLQYFLTRVEAQ 
LHVWVPPLNVRGGRDAVILLMCWHPTLVFDITKLLLAIFGPLWILQASLLKVPYFVR 
VQGLLRICALARK I AGGH Y VQMAI I KLGALTGT YVYN HLT PLRDWAHNGLRDLAVAVE 
PW FS RMETKL ITWGADTAACGD I 1 NGL PVS ARRGQE I LLG PADGMVSKGWRLLAP IT 
AYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTATQTFLATCINGVCWTVYHGAGTR 
TIASPKGPVIQMYTNVDQDLVGWPAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRG 
DSRGSLLSPRPISYLKGSSGGPLLC PAGHAVGLFRAAVCTRGVAKAVDFIPVENLGTT 
MRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATL 
GFGAYMSKAHGVDPNIRTGVRTITTGSPITYSTYGKFLADGGCSGGAYDIIICDECHS 
T DATS ILG I GTVLDQAETAGARL WLATATPPGSVT VSH PN I EEVALSTTGEI PFYGK 
AIPLEVIKGGRHLIFCHSKKKCDELAAKLVALGINAVAYYRGLDVSVIPTSGDVVWS 
TDALMTGFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRG 
KPGIYRFVAPGERPSGMFDSSVLCECYDAGCAWYELTPAETTVRLRAYMNTPGLPVCQ 
DHLEFWEGVFTGLTHIDAHFLSQTKQSGENFPYLVAYQATVCARAQAPPPSWDQMWKC 
LIRLKPTLHGPTPLLYRLGAVQNEVTLTHPITKYIMTCMSADLEWTSTWVLVGGVLA 



fig- 13? 



ALAAYCLSTGCVVIVGRIVLSGKPAI I PDREVLYQEFDEMEECSQHLPYIEQGMMLAE 
QFKQKALGLLQTASRHAEVITPAVQTNWQKLEVFWAKHMWNFISGIQYLAGLSTLPGN 
PAIASLMAFTAAVTSPLTTGQTLLFNILGGWVAAQLAAPGAATAFVGAGLAGAAIGSV 
GLGKVLVDILAGYGAGVAGALVAFKIMSGEVPSTEDLVNLLPAILSPGALVVGWCAA 
ILRRHVGPGEGAVQWMNRLIAFASRGNHVSPTHYVPESDAAARVTAILSSLTVTQLLR 
RLHQWISSECTTPCSGSWLRDIWDWICEVLSDFKTWLKAKLMPQLPGIPFVSCQRGYR 
GVWRGDGIMHTRCHCGAEITGHVKNGTMRIVGPRTCRNMWSGTFPINAYTTGPCTPLP 
APNYKFALWRVSAEEYVEIRRVGDFHYVSGMTTDNLKCPCQIPSPEFFTELDGVRLHR 
FAPPCKPLLREEVSFRVGLHEYPVGSQLPCEPEPDVAVLTSMLTDPSHITAEAAGRRL 
ARGSPPSMASSSASQLSAPSLKATCTANHDSPDAELIEANLLWRQEMGGNITRVESEN 
KVVILDSFDPLVAEEDEREVSVPAEILRKSRRFARALPVWARPDYNPPLVETWKKPDY 
EPPWHGCPLPPPRSPPVPPPRKKRTWLTESTLSTALAELATKSFGSSSTSGITGDN 
TTTSSEPAPSGCPPDSDVESYSSMPPLEGEPGDPDLSDGSWSTVSSGADTEDVVCCSM 
SYSWTGALVTPCAAEEQKLPINALSNSLLRHHNLVYSTTSRSACQRQKKVTFDRLQVL 
DSHYQDVLKEVKAAASKVKANLLSVEEACSLTPPHSAKSKFGYGAKDVRCHARKAVAH 
INSVWKDLLEDSVTPIDTTIMAKNEVFCVQPEKGGRKPARLIVFPDLGVRVCEKMALY 
DVVSKLPLAVMGSSYGFQYSPGQRVEFLVQAWKSKKT PMGFSYDTRCFDSTVTESDIR 
TEEAIYQCCDLDPQARVAIKSLTERLYVGGPLTNSRGENCGYRRCRASGVLTTSCGNT 
LTCYIKARAACRAAGLQDCTMLVCGDDLVVICESAGVQEDAASLRAFTEAMTRYSAPP 
GDPPQPEYDLELITSCSSNVSVAHDGAGKRVYYLTRDPTTPLARAAWETARHTPVNSW 
LGN I IMFAPTLWARM I LMTH FFS VL I ARDQLEQALNCE I YGACYS I E PLDLPP I IQRL 
HGLSAFSLHSYSPGEINRVAACLRKLGVPPLRAWRHRARSVRARLLSRGGRAAICGKY 
LFNWAVRTKLKLTPIAAAGRLDLSGWFTAGYSGGDIYHSVSHARPRWFWFCLLLLAAG 
VGI YLLPNR" 
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gccagccccc 
tcttcacgca 
cccccctccc 
gacgaccggg 
gcaagactgc 
gtgcttgcga 
ctcaaagaaa 
gcggtcagat 
gcgcgacgag 
aggcacgtcg 
gcaatgaggg 
ggggccccac 
cgtgcggctt 
ctgccagggc 
ggaaccttcc 
tgcccgcttc 
gccctaactc 
tcccttgcgt 
ccaccaggga 
ggagcgccac 
ttggtcaact 
ctatctatcc 
cccctacggc 
tgatcgctgg 
actgggcgaa 
tcaccggggg 
ccaagcagaa 
tgaactgcaa 
tcaactcttc 
agggctgggg 
ggcactaccc 
attgcttcac 
acagctgggg 
gcaattggtt 
ccccttgtgt 
gcaaatatcc 
gcatggtcga 
tcaaagtcag 
cgcggggcga 
tgtccaccac 
ccggcctcat 
caagcatcgc 
cagacgcgcg 
ctttggagaa 
ccttcctcgt 
tctacgccct 
catacgcact 
tggcgctgac 
agtattttct 
ggggggggcg 
acatcaccaa 
ttaaagtccc 



tgatgggggc 
gaaagcgtct 
gggagagcca 
tcctttcttg 
tagccgagta 
gtgccccggg 
aaccaaacgt 
cgttggtgga 
gaagacttcc 
gcccgagggc 
ttgcgggtgg 
agacccccgg 
cgccgacctc 
cctggcgcat 
tggttgctct 
agcctaccaa 
gagtgttgtg 
tcgcgagggt 
cggcaaactc 
cctctgctcg 
gtttaccttc 
cggccatata 
agcgttggtg 
cgcccactgg 
ggtcctggta 
aaatgccggc 
catccaactg 
tgaaagcctt 
aggctgtcct 
tcctatcagt 
tccaagacct 
tcccagcccc 
tgcaaatgat 
cggttgtacc 
catcggaggg 
ggaagccaca 
ctacccgtat 
gatgtacgtg 
acgctgtgat 
acagtggcag 
ccacctccac 
gtcctgggcc 
cgtctgttcc 
cctcgtaata 
gttcttctgc 
ctacgggatg 
ggacacggag 
tctgtcgcca 
gaccagagta 
cgatgccgtc 
actactcctg 
ctacttcgtg 



gacactccac 
agccatggcg 
tagtggtctg 
gataaacccg 
gtgttgggtc 
aggtctcgta 
aacaccaacc 
gtttacttgt 
gagcggtcgc 
aggacctggg 
gcgggatggc 
cgtaggtcgc 
atggggtaca 
ggcgtccggg 
ttctctatct 
gtgcgcaatt 
tacgaggcgg 
aacgcctcga 
cccacaacgc 
gccctctacg 
tctcccaggc 
acgggtcatc 
gtagctcagc 
ggagtcctgg 
gtgctgctgc 
cgcaccacgg 
atcaacacca 
aacaccggct 
gagaggttgg 
tatgccaacg 
tgtggcattg 
gtggtggtgg 
acggatgtct 
tggatgaact 
gtgggcaaca 
tactctcggt 
aggctttggc 
ggaggggtcg 
ctggaagaca 
gtccttccgt 
cagaacattg 
attaagtggg 
tgcttgtgga 
ctcaatgcag 
tttgcgtggt 
tggcctctcc 
gtggccgcgt 
tattacaagc 
gaagcgcaac 
atcttactca 
gccatcttcg 
cgcgttcaag 



catgaatcac 
ttagtatgag 
cggaaccggt 
ctcaatgcct 
gcgaaaggcc 
gaccgtgcac 
gtcgcccaca 
tgccgcgcag 
aacctcgagg 
ctcagcccgg 
tcctgtctcc 
gcaatttggg 
taccgctcgt 
ttctggaaga 
tccttctggc 
cctcggggct 
ccgatgccat 
ggtgttgggt 
agcttcgacg 
tgggggacct 
accactggac 
gcatggcatg 
tgctccgaat 
cgggcataaa 
tatttgccgg 
ctgggcttgt 
acggcagttg 
ggttagcagg 
ccagctgccg 
gaagcggcct 
tgcccgcaaa 
gaacgaccga 
tcgtccttaa 
caactggatt 
acaccttgct 
gcggctccgg 
actatccttg 
agcacaggct 
gggacaggtc 
gttctttcac 
tggacgtgca 
agtacgtcgt 
tgatgttact 
catccctggc 
atctgaaggg 
tcctgctcct 
cgtgtggcgg 
gctatatcag 
tgcacgtgtg 
cgtgtgtagt 
gacccctttg 
gccttctccg 



tcccctgtga 
tgtcgtgcag 
gagtacaccg 
ggagatttgg 
ttgtggtact 
catgagcacg 
ggacgtcaag 
gggccctaga 
tagacgtcag 
gtacccttgg 
ccgtggctct 
taaggtcatc 
cggcgcccct 
cggcgtgaac 
cctgctctct 
ttaccatgtc 
cctgcacact 
ggcggtgacc 
tcatatcgat 
gtgcgggtct 
gacgcaagac 
gaatatgatg 
cccacaagcc 
gtatttctcc 
cgtcgacgcg 
tggtctcctt 
gcacatcaat 
gctcttctat 
acgccttacc 
cgacgaacgc 
gagcgtgtgt 
caggtcgggc 
caacaccagg 
caccaaagtg 
ctgccccact 
tcccaggatt 
taccatcaat 
ggaagcggcc 
cgagctcagc 
gaccctgcca 
gtacttgtac 
tctcctgttc 
catatcccaa 
cgggacgcat 
taggtgggtg 
gctggcgttg 
cgttgttctt 
ctggtgcatg 
ggttcccccc 
acacccggcc 
gattcttcaa 
gatctgcgcg 



ggaactactg 
cctccaggac 
gaattgccag 
gcgtgccccc 
gcctgatagg 
aatcctaaac 
ttcccgggtg 
ttgggtgtgc 
cctatcccca 
cccctctatg 
cggcctagct 
gataccctta 
cttggaggcg 
tatgcaacag 
tgcctgactg 
accaatgatt 
ccggggtgtg 
cccacggtgg 
ctgcttgtcg 
gtctttcttg 
tgcaattgtt 
atgaactggt 
atcatggaca 
atggtgggga 
gaaacccacg 
acaccaggcg 
agcacggcct 
cagcacaaat 
gattttgccc 
ccctactgct 
ggcccggtat 
gcgcctacct 
ccaccgctgg 
tgcggagcgc 
gattgcttcc 
acacccaggt 
tacaccatat 
tgcaactgga 
ccgttgctgc 
gccttgtcca 
ggggtagggt 
cttctgcttg 
gcggaggcgg 
ggtcttgtgt 
cccggagcgg 
cctcagcggg 
gtcgggttaa 
tggtggcttc 
ctcaacgtcc 
ctggtatttg 
gccagtttgc 
ctagcgcgga 
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3121 agatagccgg aggtcattac gtgcaaatgg ccatcatcaa gttaggggcg cttactggca 
3181 cctgtgtgta taaccatctc gctcctcttc gagactgggc gcacaacggc ctgcgagatc 
3241 tggccgtggc tgtggaacca gtcgtcttct cccgaatgga gaccaagctc atcacgtggg 
3301 gggcagatac cgccgcgtgc ggtgacatca tcaacggctt gcccgtctct gcccgtaggg 
3361 gccaggagat actgcttggg ccagccgacg gaatggtctc caaggggtgg aggttgctgg 
3421 cgcccatcac ggcgtacgcc cagcagacga gaggcctcct agggtgtata atcaccagcc 
3481 tgactggccg ggacaaaaac caagtggagg gtgaggtcca gatcgtgtca actgctaccc 
3541 agaccttcct ggcaacgtgc atcaatgggg tatgctggac tgtctaccac ggggccggaa 
3601 cgaggaccat cgcatcaccc aagggtcctg tcatccagac gtataccaat gtggatcaag 
3661 acctcgtggg ctggcccgct cctcaaggtt cccgctcatt gacaccctgc acctgcggct 
3721 cctcggacct ttacctggtc acgaggcacg ccgatgtcat tcccgtgcgc cggcgaggtg 
3781 atagcagggg tagcctgctt tcgccccggc ccatttccta cttgaaaggc tcctcggggg 
3841 gtccgctgtt gtgccccacg ggacacgccg tgggcctatt cagggccgcg gtgtgcaccc 
3901 gtggagtggc taaggcggtg gactttatcc ctgtggagaa cctagagaca accatgagat 
3961 ccccggtgtt cacggacaac tcctctccac cagcagtgcc ccagagcttc caggtggccc 
4021 acctgcatgc tcccaccggc agcggtaaga gcaccaaggt cccggctgcg tacgcagcca 
4081 agggctacaa ggtgttggtg ctcaacccct ctgttgctgc aacactgggc tttggtgctt 
4141 acatgtccaa ggcccatggg gttgatccta atatcaggac cggggtgaga acaattacca 
4201 ctggcagccc catcacgtac tccacctacg gcaagttcct tgccgacgcc gggtgctcag 
4261 gaggtgctta tgacataata atttgtgacg agtgccactc cacggatgcc acatccatct 
4 321 cgggcatcgg cactgtcctt gaccaagcag agactgcggg ggcgagactg gttgtgctcg 
4381 ccactgctac ccctccgggc tccgtcactg tgtcccatcc taacatcgag gaggttgctc 
4441 tgtccaccac cggagagatc cccttttacg gcaaggctat ccccctcgag gtgatcaagg 
4501 ggggaagaca tctcatcttc tgccactcaa agaagaagtg cgacgagctc gccgcgaagc 
4 561 tggtcgcatt gggcatcaat gccgtggcct actaccgcgg tcttgacgtg tctgtcatcc 
4 621 cgaccagcgg cgatgttgtc gtcgtgtcga ccgatgctct catgactggc tttaccggcg 
4681 acttcgactc tgtgatagac tgcaacacgt gtgtcactca gacagtcgat tttagccttg 
4 741 accctacctt taccattgag acaaccacgc tcccccagga tgctgtctcc aggactcaac 
4801 gccggggcag gactggcagg gggaagccag gcatctatag atttgtggca ccgggggagc 
4861 gcccctccgg catgttcgac tcgtccgtcc tctgtgagtg ctatgacgcg ggctgtgctt 
4 921 ggtatgagct cacgcccgcc gagactacag ttaggctacg agcgtacatg aacaccccgg 
4981 ggcttcccgt gtgccaggac catcttggat tttgggaggg cgtctttacg ggcctcactc 
5041 atatagatgc ccactttcta tcccagacaa agcagagtgg ggagaacttt ccttacctgg 
5101 tagcgtacca agccaccgtg tgcgctaggg ctcaagcccc tcccccatcg tgggaccaga 
5161 tgcggaagtg tttgatccgc cttaaaccca ccctccatgg gccaacaccc ctgctataca 
5221 gactgggcgc tgttcagaat gaagtcaccc tgacgcaccc aatcaccaaa tacatcatga 
5281 catgcatgtc ggccgacctg gaggtcgtca cgagcacctg ggtgctcgtt ggcggcgtcc 
5341 tggctgctct ggccgcgtat tgcctgtcaa caggctgcgt ggtcatagtg ggcaggatcg 
5401 tcttgtccgg gaagccggca attatacctg acagggaggt tctctaccag gagttcgatg 
54 61 agatggaaga gtgctctcag cacttaccgt acatcgagca agggatgatg ctcgctgagc 
5521 agttcaagca gaaggccctc ggcctcctgc agaccgcgtc ccgccatgca gaggttatca 
5581 cccctgctgt ccagaccaac tggcagaaac tcgaggtctt ttgggcgaag cacatgtgga 
5641 atttcatcag tgggatacaa tacttggcgg gcctgtcaac gctgcctggt aaccccgcca 
5701 ttgcttcatt gatggctttt acagctgccg tcaccagccc actaaccact ggccaaaccc 
5761 tcctcttcaa catattgggg gggtgggtgg ctgcccagct cgccgccccc ggtgccgcta 
5821 ccgcctttgt gggcgctggc ttagctggcg ccgcactcga cagcgttgga ctggggaagg 
5881 tcctcgtgga cattcttgca ggctatggcg cgggcgtggc gggagctctt gtggcattca 
5941 agatcatgag cggtgaggtc ccctccacgg aggacctggt caatctgctg cccgccatcc 
6001 tctcacctgg agcccttgca gtcggtgtgg tctttgcatc aatactgcgc cggcgtgttg 
6061 gcccgggcga gggggcagtg caatggatga accggctaat agccttcgcc tcccggggga 
6121 accatgtttc ccccacacac tacgtgccgg agagcgatgc agccgcccgc gtcactgcca 
6181 tactcagcag cctcactgta acccagctcc tgaggcgact gcatcagtgg ataagctcgg 
6241 agtgtaccac tccatgctcc ggttcctggc taagggacat ctgggactgg atatgcgagg 
6301 tgctgagcga ctttaagacc tggctgaaag ccaagctcat gccacaactg cctgggattc 
6361 cctttgtgtc ctgccagcgc gggtataggg gggtctggcg aggagacggc attatgcaca 
6421 ctcgctgcca ctgtggagct gagatcactg gacatgtcaa aaacgggacg atgaggatcg 
6481 tcggtcctag gacctgcaag aacatgtgga gtgggacgtt cttcattaat gcctacacca 
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cgggcccctg 
cagaggaata 
ctgacaatct 
gggtgcgcct 
tcagagtagg 
acgtagccgt 
ggagaaggtt 
ccgctccatc 
tagaggctaa 
agaacaaagt 
aggtctccgt 
tctgggcgcg 
aaccacctgt 
ctcggaaaaa 
ttgccaccaa 
catcctctga 
ccatgccccc 
cggtcagtag 
caggcgcact 
gcaactcgtt 
aaaggaagaa 
tgctcaagga 
aagcttgcag 
acgtccgttg 
tggaagacag 
ttcagcctga 
tgcgcgtgtg 
tgggaagctc 
cgtggaagtc 
cagtcactga 
cccaagcccg 
ctaattcaag 
ctagctgtgg 
ggctccagga 
cgggggtcca 
ccgccccccc 
cctccaacgt 
accctacaac 
cctggctagg 
cccacttctt 
tctacggagc 
atggcctcag 
catgcctcag 
tccgcgctag 
actgggcagt 
tgtccggctg 
cccggccccg 
tcctccccaa 



tactcccctt 
cgtggagata 
caaatgcccg 
acataggttt 
actccacgag 
gttgacgtcc 
ggcgagaggg 
tctcaaggca 
cctcctgtgg 
ggtgattctg 
acccgcagaa 
gccggactac 
ggtccatggc 
gcgtacggtg 
aagttttggc 
gcccgcccct 
cctggagggg 
tggggccgac 
cgtcaccccg 
gctacgccat 
gaaagtcaca 
ggtcaaagca 
cctggcgccc 
ccatgccaga 
tgtaacacca 
gaaggggggt 
cgagaagatg 
ctacggattc 
caagaagacc 
gagcgacatc 
cgtggccatc 
gggggaaaac 
taacaccctc 
ctgcaccatg 
ggaggacgcg 
cggggacccc 
gtcagtcgcc 
ccccctcgcg 
caacataatc 
tagcgtcctc 
ctgctactcc 
cgcattttca 
aaaacttggg 
gcttctggcc 
aagaacaaag 
gttcacggct 
ctggttctgg 
ccgatgaaga 



cctgcgccga 
aggcgggtgg 
tgccagatcc 
gcgccccctt 
tacccggtgg 
atgctcactg 
tcaccccctt 
acttgcaccg 
aggcaggaga 
gactccttcg 
attctgcgga 
aaccccctgc 
tgcccgctac 
gtcctcaccg 
agctcctcaa 
tctggctgcc 
gagcctgggg 
acggaagatg 
tgcgctgcgg 
cacaatctgg 
tttgacagac 
gcggcgtcaa 
ccacattcag 
aaggccgtag 
atagacacta 
cgtaagccag 
gccctgtacg 
caatactcac 
ccgatggggc 
cgtacggagg 
aagtccctca 
tgcggctacc 
actcgctaca 
ctcgtgtgtg 
gcgagcctga 
ccacaaccag 
cacgacggcg 
agagccgcgt 
atgtttgccc 
atagccaggg 
atagaaccac 
ctccacagtt 
gtcccgccct 
agaggaggca 
ctcaaactca 
ggctacagcg 
ttttgcctac 
ttgggctaac 



actataagtt 
gggacttcca 
catcgcccga 
gcaagccctt 
ggtcgcaat t 
atccctccca 
ctatggccag 
ccaaccatga 
tgggcggcaa 
atccgcttgt 
agtctcggag 
tagtagagac 
cacctccacg 
aatcaaccct 
cttccggcat 
cccccgactc 
atccggatct 
tcgtgtgctg 
aggaacaaaa 
tgtattccac 
tgcaagttct 
aagtgaaggc 
ccaaatccaa 
cccacatcaa 
ccatcatggc 
ctcgtctcat 
acgtggttag 
caggacagcg 
tctcgtatga 
aggcaattta 
ctgagaggct 
gcaggtgccg 
tcaaggcccg 
gcgacgactt 
gagccttcac 
aatacgactt 
ctggaaagag 
gggagacagc 
ccacactgtg 
atcagcttga 
tggatctacc 
actctccagg 
tgcgagcttg 
aggctgccat 
ctccgataac 
ggggagacat 
tcctgcttgc 
cactccaggc 



cgcgctgtgg 
ctacgtatcg 
atttttcaca 
gctgcgggag 
accttgcgag 
tataacagca 
ctcctcggct 
ctcccctgac 
catcaccagg 
ggcagaggag 
attcgcccca 
gtggaaaaag 
gtcccctcct 
acctactgcc 
tacgggcgac 
cgacgttgag 
cagcgacggg 
ctcaatgtct 
actgcccatc 
cacttcacgc 
ggacagccat 
taacttgcta 
gtttggctat 
ctccgtgtgg 
caagaacgag 
cgtgttcccc 
caagctcccc 
ggttgaattc 
tacccgctgt 
ccaatgttgt 
ttatgttggg 
cgcgagcaga 
ggcagcctgt 
agtcgttatc 
ggaggctatg 
ggagcttata 
ggtctactac 
aagacacact 
ggcgaggatg 
acaggctctc 
tccaatcatt 
tgaaattaat 
gagacaccgg 
atgtggcaag 
ggccgctggc 
ttatcacagc 
tgcaggggta 
caataggcca 



agggtgtctg 
ggcatgacta 
gaattggacg 
gaggtatcat 
cccgaaccgg 
gaggcggccg 
agccagctgt 
gccgagctca 
gttgagtcag 
gatgagcggg 
gccctgcccg 
cctgactacg 
gtgcctccgc 
ttggccgagc 
aatacgacaa 
tcctattctt 
tcatggtcga 
tattcctgga 
aacgcactga 
agtgcttgcc 
taccaggacg 
tccgtagagg 
ggggcaaaag 
aaagaccttc 
gttttctgcg 
gacctgggcg 
ttggccgtga 
ctcgtgcaag 
tttgactcca 
gacctggacc 
ggccctctta 
gtactgacaa 
cgagccgcag 
tgtgaaagtg 
accaggtact 
acatcatgct 
cttacccgtg 
ccagtcaatt 
atactgatga 
aactgcgaga 
caaagactcc 
agggtggccg 
gcctggagcg 
tacctcttca 
cggctggact 
gtgtctcatg 
ggcatctacc 
ttccct 
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MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRL 
GVRATRKTSERSQPRGRRQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRG 
SRPSWGPTDPRRRSRNLGKVIDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLED 
GVNYATGNLPGCSFSIFLLALLSCLTVPASAYQVRNSSGLYHVTNDCPNSSWYEAAD 
AILHT PGCV PCVREGNASRCWVAVT PTVATRDGKLPTTQLRRH IDLLVGSATLCSALY 
VGDLCGSVFLVGQLFTFSPRHHWTTQDCNCSIYPGHITGHRMAWNMMMNWSPTAALVV 
AQLLRIPQAIMDMIAGAHWGVLAGIKYFSMVGNWAKVLVVLLLFAGVDAETHVTGGNA 
GRTTAGLVGLLTPGAKQNIQLINTNGSWHINSTALNCNESLNTGWLAGLFYQHKFNSS 
GCPERLASCRRLTDFAQGWGPISYANGSGLDERPYCWHYPPRPCGIVPAKSVCGPVYC 
FTPSPWVGTTDRSGAPTYSWGANDTDVFVLNNTRPPLGNWFGCTWMNSTGFTKVCGA 
PPCVIGGVGNNTLLCPTDCFRKYPEATYSRCGSGPRITPRCMVDY PYRLWHYPCTINY 
TIFKVRMYVGGVEHRLEAACNWTRGERCDLEDRDRSELSPLLLSTTQWQVLPCSFTTL 
PALSTGLIHLHQNIVDVQYLYGVGSSIASWAIKWEYWLLFLLLADARVCSCLWMMLL 
ISQAEAALENLVILNAASLAGTHGLVSFLVFFCFAWYLKGRWVPGAVYALYGMWPLLL 
LLLALPQRAYALDTEVAASCGGWLVGLMALTLSPYYKRYISWCMWWLQYFLTRVEAQ 
LHVWVPPLNVRGGRDAVILLTCWHPALVFDITKLLLAIFGPLWILQASLLKVPYFVR 
VQGLLRICALARKIAGGHYVQMAIIKLGALTGTCVYNHLAPLRDWAHNGLRDLAVAVE 
PWFSRMETKLITWGADTAACGDIINGLPVS7U<RGQEILLGPADGMVSKGWRLLAPIT 
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